Subscribe

Noodling with Atom/RDF

Now that GRDDL’s a Recommendation, it’s about time we started using it. One particular bit of (potentially) low-hanging fruit is Atom (RFC 4287) - cleanly specified XML, well deployed for bloggish content syndication and increasingly having interesting extensions shoehorned in.

Anyhow, more on that some other time. I finally got around to trying a long-standing item on my to-do list: RDFize the wonderful Planet Venus aggregator. I reckon a persistent, queryable store of interesting subscriptions is a must-have part of any respectable personal knowledgebase. I haven’t time right now to go into detail on how it works (and in it’s current form you probably wouldn’t want to know), but basically these minimal Python scripts transform Venus’s Atom cache into RDF/XML and post the result of to a Talis Platform store (after first checking the entry isn’t already in the store). So far I’ve got it working enough to make some data available for SPARQling.
If you go to this SPARQL Query form, select the “twitcrit reviews” endpoint with the dropdown and enter a query like this:

PREFIX ar: <http://djpowell.net/schemas/atomrdf/0.3/>

SELECT DISTINCT ?entry ?tp ?title ?cp ?content
WHERE {
[
a ar:EntryInstance ;
ar:entry ?entry;
ar:title [ ?tp ?title ] ;
ar:content [ ?cp ?content ]
]
}
LIMIT 10


- you should see some results.


Next steps are to set up some local caching (thinking of just keeping a list of cache filenames) and turning it over to use the Changeset Protocol rather than the basic unversioned model posts it’s doing now. Once those are in place I’ll make a cron job for it.

There are quite a few different atom2rdf XSLT’s in circulation, the current best-bet frontrunner being one atom2rdf-18.xsl from David Powell, so I used that. Here’s the Venus install, I just pulled out a bunch of the semweb related feeds from my Bloglines subscriptions (note that I cleared the cache earlier today, there was way too much stuff in it for testing).

4 Responses

  1. Morten Høybye Frederiksen Says:

    Heh.

    Great minds think alike. Or something.

    I was just yesterday thinking of ditching my WP-Venus in favour of something like this, most importantly powered by SPARQL.
    Is the Changeset stuff necessary, you think?

    PS: Any way you could add the author name or nick to the theme here, I had to check the feed to make sure it was you, Danny.

  2. Ian Davis Says:

    Hi Morten. This theme is only temporary and we’ll certainly be including the author names in the replacement.

  3. danja Says:

    Morten, cool - I’ll use yours instead :-)

    Re. Changesets - hmm, I must check, I was pretty sure some kind of (statement-replacing) update protocol is needed, but now I’m having my doubts, maybe David’s latest model avoids the need. (Henry figured out that the apparently simple Atom approach to identification actually uses combined inverse functional properties, which is a complication in the RDF world to say the least…).

  4. Morten Høybye Frederiksen Says:

    My current thinking, in this corner of the use case arena, is that a file in the cache represents an entry. So the filename is a key, which is generated off the entry id.

    Whether that is good enough for real life, I’m not sure, but it’s good enough for Planet Venus, and since we’re downstream from that…

    It does create issues, if the same entry is syndicated from multiple sources (it happens, think del.icio.us etc.), but I’m not sure there’s a better way to handle that case in general, differing properties and all.

Leave a Reply