Importing Large RDF Documents: Streaming Parsing of RDF/XML with ARC2
A common trouble when parsing RDF is running out of memory because the document is too large. ARC2 solves this problem (for RDF/XML) by being able to stream it.
If you want to take advantage of the streaming, you just need to extend the ARC2_RDFXMLParser class and overwrite the addT method:
<?php
require 'arc/ARC2.php';
require 'arc/parsers/ARC2_RDFXMLParser.php';
class Streamer extends ARC2_RDFXMLParser {
function addT($s, $p, $o, $s_type, $o_type, $o_dt = '', $o_lang = ''){
var_dump($s, $p, $o, $s_type, $o_type, $o_dt, $o_lang);
}
}
$p = new Streamer();
$p->parse('big-data.rdf');
?>
In this simple example, I’m just var_dumping out the triples as they come in, but of course you should do whatever it is you want to do instead to the triple in that method.


January 17th, 2011 at 4:27 pm
I should have said: a task this is very useful for when using the Talis Platform is POSTing data from a large file to the platform. POSTing single large files can be slow and unreliable: it is better to post data in chunks of around 1-2mb. You can do this by extending ARC’s parser (as above) to store a buffer of around 1000 triples, and POST the data to the Platform whenever the buffer reaches 1000 triples.