Subscribe

Archive for the 'Ideas and Experiments' Category

Exploring OpenLibrary Part One

I thought it was about time I got around to taking a better look at what might be possible with the OpenLibrary data.

My plan is to try and convert it into meaningful RDF and see what we can find out about things along the way. The project is an own-time project mostly, so progress isn’t likely to be very rapid. Let’s see how it goes. I’ll diary here as stuff gets done.

To save me typing loads of stuff out here, today’s source code is tagged and in the n2 subversion as day 1 of OpenLibrary.

Day one, 3rd October 2008, I downloaded the authors data from OpenLibrary and unzipped it. I’m also downloading the editions data from OpenLibrary, but that’s bigger (1.8Gb) so I’m playing with the author data while that comes down the tubes.

The data has been exported by OpenLibrary as JSON, so is pretty easy to work with. I’m going to write some PHP scripts on the command line to mess with it and it looks great for doing that.

Each line of the JSON in the authors file represents a single author, although some authors will have more than one entry. Taking a look at Iain Banks (aka Iain M Banks) we have the following entries:


{"name": "Banks, Iain", "personal_name": "Banks, Iain", "key": "\/a\/OL32312A", "birth_date": "1954", "type": {"key": "\/type\/type"}, "id": 81616}
{"name": "Banks, Iain.", "type": {"key": "\/type\/type"}, "id": 3011389, "key": "\/a\/OL954586A", "personal_name": "Banks, Iain."}
{"type": {"key": "\/type\/type"}, "id": 9897124, "key": "\/a\/OL2623466A", "name": "Iain Banks"}
{"type": {"key": "\/type\/type"}, "id": 9975649, "key": "\/a\/OL2645303A", "name": "Iain Banks         "}
{"type": {"key": "\/type\/type"}, "id": 10565263, "key": "\/a\/OL2774908A", "name": "IAIN M. BANKS"}
{"type": {"key": "\/type\/type"}, "id": 10626661, "key": "\/a\/OL2787336A", "name": "Iain M. Banks"}
{"type": {"key": "\/type\/type"}, "id": 12035518, "key": "\/a\/OL3127859A", "name": "Iain M Banks"}
{"type": {"key": "\/type\/type"}, "id": 12078804, "key": "\/a\/OL3137983A", "name": "Iain M Banks         "}
{"type": {"key": "\/type\/type"}, "id": 12177832, "key": "\/a\/OL3160648A", "name": "IAIN M.BANKS"}

In total the file contains 4,174,245 entries. First job is to get a more manageable set of data to work with. So, I wrote a short script to extract 1 line in every 10 from a file. The resulting sample author data file contains 417,424 entries. This is more manageable for quick testing of what I’m doing.

So now we can start writing some code to produce some RDF. Given the size of these files, I need to stream the data in and out again in chunks. The easiest format I find for that is turtle which has the added benefit of being human readable. YMMV. Previously I’ve streamed stuff out using n-triples. That has some great benefits too, like being able to generate different parts of the graph, for the same subject, in different parts of the file then being them together using a simple command line sort. It’s also a great format for chunking the resulting data into reasonable size files as breaking on whole lines doesn’t break the graph, whereas with rdf/xml and turtle it does.

So, I may end up dropping back to n-triples, but for now I’m going to use turtle.

I also like working on the command line and love the unix pipes model, so I’ll be writing the cli (command line) tools to read from STDIN and write to STDOUT so I can mess with the data using grep, sed, awk, sort, uniq and so on.

First things first, Let’s find out what’s really in the authors data. Reading the json line by line and converting each line into an associative array is simple in PHP, so let’s do that, keep track of all the keys we find in the arrays and recurse into the nested arrays to look at them - then dump the result out. The arrays contain this set of keys:

alternate_names
alternate_names
alternate_names\1
alternate_names\2
alternate_names\3
bio
birth_date
comment
date
death_date
entity_type
fuller_name
id
key
location
name
numeration
personal_name
photograph
title
type
type\key
website

So, they have names, birth dates, death dates, alternate names and a few other bits and pieces. And they have a ‘key’ which turns out to be the resource part of the OpenLibrary url. That’s means we can link back into OpenLibrary nice and easy. Going back to our previous Iain Banks examples, we want to create something like this for each one:


@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix bio: <http://vocab.org/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.com/a/OL32312A>
	foaf:Name "Banks, Iain";
	foaf:primaryTopicOf <http://openlibrary.org/a/OL32312A>;
	bio:event <http://example.com/a/OL32312A#birth>;
	a foaf:Person .

<http://example.com/a/OL32312A#birth>
	bio:date "1954";
	a bio:Birth .

This gives us a foaf:Person for the author and tracks his birth date using a bio:Birth event. While tracking the birth as a separate entity may seem odd it gives the opportunity to say things about the birth itself. We’ll model death dates the same way, for the same reason. I’ve written some basic code to generate foaf from the OpenLibrary authors.

Linking back to the OpenLibrary url has been done here using foaf:primaryTopicOf. I didn’t use owl:sameAs because the url at OpenLibrary is that of a web page, whereas the uri here (http://example.com/a/OL32312A) represents a person. Clearly a person is not the same as a web page that contains information about them.

The only thing worrying me is that the uris we’re using are constructed from OpenLibrary’s keys. This makes matching them up with other data sources hard. Matching with other data sources requires a natural key, but there’s not enough data in these author entries to create one. The best I can do is to create a natural key that will enable people to discover the group of authors that share a name.


@prefix mine: <http://example.com/mine/schema#> .
<http://example.com/names/banksiain>
	mine:name_of <http://example.com/a/OL32312A>;
	a mine:Name .

These uris will enable me to find authors that share the same name easily, either because they do share the same name or because they’re duplicates. The natural key is simply the author’s name with any casing, whitespace or punctuation stripped out. That might need to evolve as I start looking at the names in more detail later.

Next step is to look in more detail at the dates in here, we have some simple cases of trailing whitespace or trailing punctuation, but also some more interesting cases of approximate dates or possible ranges - these occur for historical authors mostly. The complete list of distinct dates within the authors file is in svn. If you know anything about dates, feel free to throw me some free advice on what to do with them…

VRM with FOAF + OpenID

A quick note-to-self. I’m currently working on some other FOAF + OpenID stuff, so this is nearby enough that I might well put together a demo in the near future…but not today.

Tim Bray discusses Changing your address in the context of Vendor Relationship Management, prompted by “Feeds-Based VRM”: A Web-Centric Approach to VRM Implementation. The question is how you keep a vendor (or other contact) aware of your current address.

I came to the same conclusion as Tim, that feeds aren’t really necessary for this kind of thing, the data can be put directly on the Web and the contact given the appropriate URI. In comments over there I pointed to Tim Berners-Lee’s Give yourself a URI - an online FOAF profile solves most of the problem. The part it doesn’t solve is access control - you might not want to make your address public. But with the help of linked data, off-the-shelf tools and a little scripting, this is pretty easy to fix.

First of all, looking at how you might represent this information, vCard is the dominant model for this kind of info. Whether that’s expressed in the original vCard format or hCard or RDFa or RDF/XML doesn’t really matter. These can all be mapped to the RDF model, which is key to what follows… Here’s the relevant bit of a vCard in Turtle syntax (first pass, probably not 100% correct):

prefix : <http://www.w3.org/2006/vcard/ns#> .
[ a :VCard;
:agent <#me>
:homeAdr [
a :Address;
:street-address "7, Mozzanella" ;
:country-name "Italy"
] ;
]

Now I could just dump this in my public FOAF profile at, say http://example.org/public/me. But because I want the address to be restricted, I’ll separate the information (following the principles of linked data) like this -

in http://example.org/public/me -

prefix : <http://www.w3.org/2006/vcard/ns#> .
[ a :VCard;
:agent <#me>
:homeAdr <http://example.org/restricted/myaddress> .
]

and in <http://example.org/restricted/myaddress> :

prefix : <http://www.w3.org/2006/vcard/ns#> .
<> a :Address;
:street-address "7, Mozzanella" ;
:country-name "Italy" .

Now I need to wrap the latter part in authentication/authorization. Traditionally I might hard-code a list of who can see this data, but there’s a neater way. Somewhere I’ll put statements like the following (with proper URIs as appropriate):

<#me> foaf:knows [
<personA> foaf:openid <personAopenID>
]
<#me> x:businessContact [
<personB> foaf:openid <personBopenID>
]
<#me> x:businessContact [
<personC> foaf:openid <personCopenID>
]
<#me> x:businessContact [
<personD> foaf:openid <personDopenID>
]

Anyone wishing to see the restricted info will be asked for their OpenID URI. Whether they can see a particular resource can be governed by simple rules, for example expressed through string-templated SPARQL queries:

SELECT ?person
WHERE {
?person foaf:openid $openid$ .
OPTIONAL { <#me> foaf:knows ?person }
OPTIONAL { <#me> x:businessContact ?person }
}

Ok, that’s very sketchy, but hopefully gives the idea. To be properly declarative in practice you’d probably want to put the access rules in a separate chunk of RDF, and query across the whole lot. But given decent libraries (e.g. the OpenID PHP lib worked pretty much out of the box for me, and ARC is a really straightforward PHP RDF toolkit), we’re talking about maybe a days work to write and deploy the scripts - which could be used by anyone else with regular PHP-capable hosting.

A Web-centric approach to VRM should use the Web, and as Berners-Lee himself recently put it:

Linked Data is the Semantic Web done as it should be. It is the Web done as it should be.

Ad hoc plumbing

Half an hour ago I discovered microrevie.ws, a Twitter-based review site. I couldn’t resist a quick play.

Like twitcrit 1.1 [currently not working *] and twitcrit 2.0, these reviews are authored in Twitter using a few (different) conventions. The microrevie.ws page is HTML with embedded microformats, which made me think right away of GRDDL. There were a couple of slight snags - the HTML isn’t XHTML its HTML5, and the page doesn’t declare HTML Meta Data profiles for the microformats. So here we go…

  1. live microrevie.ws page
  2. (1) piped through an online HTML Tidy service to yield a XHTML version
  3. (2) with a simple XSLT applied to insert @profile, using an online XSLT service to yield GRDDL-friendly XHTML
  4. (3) sent through triplr.org to yield RDF - here in Turtle syntax

Look ma…no import/export!

I’m pretty sure the GRDDL transformations aren’t 100% complete/accurate, and I couldn’t find one for hAtom which is used in the source, but there’s enough to show a lot of triple, generated live from the source simply by hooking together the URIs. Check this -
http://triplr.org/turtle/http://www.w3.org/2000/06/webdata/xslt?xslfile=http%3A%2F%2Fhyperdata.org%2Fxslt%2Fprofiles.xsl&xmlfile=http%3A%2F%2Fcgi.w3.org%2Fcgi-bin%2Ftidy%3FdocAddr%3Dhttp%253A%252F%252Fmicrorevie.ws%252F%26forceXML%3Don&transform=Submit

Not exactly the kind of thing you’d want on your business card, but it is bookmarkable/linkable.

Oh yeah - want a visualization with that? Flip triplr.org to /rdf/ and try it on the RDF Validator : http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Ftriplr.org%2Frdf%2Fhttp%3A%2F%2Fwww.w3.org%2F2000%2F06%2Fwebdata%2Fxslt%3Fxslfile%3Dhttp%253A%252F%252Fhyperdata.org%252Fxslt%252Fprofiles.xsl%26xmlfile%3Dhttp%253A%252F%252Fcgi.w3.org%252Fcgi-bin%252Ftidy%253FdocAddr%253Dhttp%25253A%25252F%25252Fmicrorevie.ws%25252F%2526forceXML%253Don%26transform%3DSubmit&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_BOTH&FORMAT=PNG_EMBED

(scroll down towards the bottom & right to see the graph - it’s a bit big)

Should work in the Tabulator too.

Incidentally, there’s a neat trick at microrevie.ws: the subject of reviews gets posted to twitter as a simple string (ending with a ‘;’) but gets turned into a URI, e.g.
http://microrevie.ws/reviewables/Sparks+Alcoholic+Caffeinated+Beverage

Could come in handy for those times you really want a literal as the subject of a triple. Right now microrevie.ws has Google hooked up to help you find out what the thing is. Making that more explicit seems like it might be a job for Open Linking Data

* I’m pretty sure there won’t be much difference in complexity of the operational code between twitcrit 1.1 and twitcrit 2.0. One possible explanation for the reason the latter is still running (even though I haven’t looked at it in months) and the former isn’t might be that the code proper in the working version is just a simple bit of Python, loosely coupled to some Software as a Service doing storage + SPARQL elsewhere (it’s on the Talis Platform). If I’d had to run that bit of infrastructure myself, I doubt very much whether that’d still be running.

Import/export and the Web

A post on the Open Data Definition list from Ben Werdmuller asks an interesting question - is syndication an easier sell than import/export?

Ok, background first: Open Data Definition is a proposed format for transfer of data between systems, with DataPortability in mind. In many respects it’s a ‘lite’ reinvention of RDF, targeted at the average Web developer. While I and others might question the underlying assumption that RDF is too difficult for typical Web developers, and perhaps express a little gut-reaction pushback, there’s nothing inherently wrong with something like this if it fills a (possibly significant) niche, and plays nicely with other Web standards. Design-wise, there is a sanity check which can be applied, the Test of Independent Invention :

If someone else had already invented your system, would theirs work with yours?

Does/could RDF work with ODD? - well, nearly. Yes, because it should be reasonably straightforward to map between RDF graphs and ODD’s format (there’s an interesting little complication in its indirection of metadata that’d take a bit of figuring, but bashing it with SPARQL & XSLT for a while would probably suggest a good approach). It fails right now because ODD doesn’t as yet allow for transparent interpretation, not having an XML namespace, hence not really placing itself on the global Web. Any automatic conversion would have to be done by sniffing the content - an agent needs complete prior knowledge. [If the ODD folks are willing to give the format a namespace, I’ll volunteer to sort out the mappings & GRDDL bits]. Hmm, I wonder if they’ve tried nesting ODD in other XML formats yet…

Anyhow, back to Ben’s question. I think he has a point - syndication should be a relatively easy sell these days because of RSS/Atom. But marketing aside, there are several different ways to get the data from system A to system B:

  1. import/export where the data is transferred through an intermediary (i.e. the desktop)
  2. one-off direct transfer (system B does a GET to system A)
  3. polling - traditional syndication, periodic transfer
  4. linkage - lazy polling, any transfer happens on demand

At this point in time, the first of these isn’t exactly Web-friendly, typically requiring a human intermediary for its operation. In future, with smarter clients maintaining a local cache of data, something like this might make more sense. Such clients could be acting as proxies for any of the other modes of connection. But let’s assume this kind of capability’s already here. If you stand back, the same thing is happening in all these cases - the receiver will be given an identifier for the resource of interest (the profile data or whatever) and can use HTTP on it as appropriate. This is completely independent to what’s in the data itself - even though RSS/Atom formats contain a series of time-stamped entries, the way they get processed is up to the consumer. These different modes are orthogonal to authentication/authorization and privacy or copyright issues. Each is, in its own way, using linked data. To get more information about something, the consumer follows its nose and dereferences the URIs. ‘Course if you bring message content into the equation and/or allow an arbitrary number of agents in the interaction, the number of possible modes explodes.

So yeah, ok, what point am I trying to make here…dunno, it just seems somehow significant that questions like “syndication or import/export?” should arise, given the underlying infrastructure. More telling of the silo nature of many current Web systems - themselves generally products of a pre-Web mindset - than anything to do with the Web itself. This too shall pass, as they say.

See also: Walled gardens: mapping the parties

PS. Reminds me - in my little DP video I had a mockup of a “Connect!” button. It was only a mockup because of the deadline for videos, the implementation I had in mind being essentially OpenID + HTTP GET + SPARQL CONSTRUCT

GRDDLing DeWitt’s Friends

DeWitt Clinton has a great write-up of Creating a HTML “friends” page from a Google Reader subscription list, a bit of hackery which leads to a hCard microformat-enriched friends list. A little tweak to the HTML can make it more machine-friendly, just adding a HTML Meta Data profile URI:

<head profile="http://www.w3.org/2006/03/hcard">

That profile is GRDDL-enabled, so any GRDDL-aware agent can interpret the source document as RDF. This part’s easy to demonstrate, thanks the online W3C GRDDL service. So I’ve put a tweaked version of the HTML online, and here’s DeWitt’s friends page as RDF (in Turtle syntax, rendered a little verbosely).

Having set this up I realised the data wasn’t actually expressing the friend relationship, so went on to put together some SPARQL to sort that out - below. But afterwards I realised that DeWitt’s HTML was actually expressing the relationships using XFN class names, but again without the profile URI to make it machine-friendly. So another tweak:

<head profile="http://www.w3.org/2006/03/hcard http://www.w3.org/2003/g/td/xfn-workalike">

- the corresponding service output (scroll down to see the extra bits). I suppose I should mention that you can have as many space-separate profiles as you like, and the GRDDL-aware agent will interpret them independently, just accumulating all the triples. The second profile URI adds xfn:friend relationships, I think it would have been more useful with foaf:knows as well, but it is only a demo.One of these days the microformats folks might get around to tweaking the official profile appropriately…

The SPARQL I mentioned looks like this:

prefix rdf:
prefix vcard:
prefix foaf:

CONSTRUCT
{
[ a foaf:Person;
foaf:homepage ;
foaf:name "DeWitt Clinton" ;
]
foaf:knows
[ a foaf:Person;
foaf:homepage ?homepage ;
foaf:name ?name ] .
}
WHERE
{
[ a vcard:VCard ;
vcard:url ?homepage ;
vcard:fn ?name ]
}

- when applied to DeWitt’s data (as RDF), this will map it across from the vCard vocabulary - finding the appropriate ?variables by matching the pattern in the WHERE clause, inserting those ?variables into the CONSTRUCT clause to produce some new RDF.

I tried this on the Redland SPARQL demo, and I think it’s producing the RDF I wanted. Unfortunately the serialization is really ugly - lots of bnodes, and it’s hard to check visually. It appears to confuse Tabulator too, and the W3C RDF Validator which is handy for this kind of visualization appears to be down. (Here’s a copy of the RDF/XML). Still, it was only a workaround - with the right profiles in place it’s not needed.

I’m not sure if there’s a microformat way of expressing that the source data was a subscription/reading list. To get the richest RDF out it might be easier to do what DeWitt did, but to a full RDF serialization rather than microformatted HTML (which is effectively a CustomRdfDialect), producing something like Planet RDF’s blogroll.

Talis Store Plugin for ARC

The PHP coders amongst you may be interested in a Talis Store Plugin. To install it:

cd arc/plugins #yoru ARC plugins directory

svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/ talis
svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/ARC2_SPARQLSerializerPlugin/ARC2_SPARQLSerializerPlugin.php ARC2_SPARQLSerializerPlugin.php

Then to use it:

require_once '../ARC2.php';   

/* configuration */
$talis_config = array(
  // 'db_user' => 'your_username',
  // 'db_pwd' => 'your_password',
  'store_name' => 'kwijibo-dev3', // your store name
   'fetch_graphs' => false, // If set to true, using FROM will fetch the graph as a datasource over the web, and store it in /meta
);
$store = ARC2::getComponent('Talis_StorePlugin', $talis_config);
$store->query("LOAD ")

What this does is let you use a Talis store instead of the ARC mysql store. It supports a subset of ARC’s SPARQL+ functionality. Specifically, it supports INSERT and DELETE (which I could translate to Changesets thanks to Benji’s SPARQL parser), but not the aggregate functions (which I don’t see a way to support in a client-layer at this point).

Some differences:

Named Graphs are currently a bit different in Talis stores - you can’t (yet) create your own on the fly as you can with ARC, so LOAD will put the data into the public graph by default.

Talis platform transforms bnodes into URIs, so .

I also added a few methods to the api:

$store->import($arc_store);
$store->export($arc_store);

(The idea is that you can move data between an ARC store and a Talis store).

I also added a $store->change($before_rdf, $after_rdf) method for submitting changes to an RDF graph.

It’s quite interesting comparing the two different ways of making changes (changesets and SPARQL+). I think that changesets (especially with the coming Batch Changeset support) are maybe a bit more amenable to programmatic resource updates from forms and the like. However, changesets are a bit verbose to hand-write for making quick edits and testing stuff, or pattern-based changes, and I’m finding SPARQL+ really handy for stuff like this.

What I’ve been thinking would be pretty neat would be if the SPARQL parser could be a bit more user extensible, and pre-query hooks could be set up (like ARC’s triggers, which happen post-query), so that plugin/hook writers could extend the SPARQL functionality, or just do stuff pre-query. Use cases might include:

  • rewriting SPARQL for performance improvement, or access control
  • pre-fetching data from FROM graphs over the web and adding it to the store (you can set a ‘fetch_graphs’=> true parameter in the config array you set up the talis store with, and it will do this)
  • adding versioned changesets to the ARC store
  • inventing new keywords - eg: ABOUT <http://example.org/foo> could be rewritten to DESCRIBE ?s WHERE {{ ?s rdf:subject <http://example.org/foo> } UNION {?s cs:subjectOfChange <http://example.org/foo> } } - Similarly you could add syntactic support for rollbacks, transactions, updates

You can see more usage examples at: http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/Talis_StorePlugin.demo.php

More recording studio RDF

Yves Raimond responded to my post aboutMusic/Audio Equipment Lists with Describing a recording session in RDF. I like it - looks useful.

Coincidentally I found my self doing something closely related yesterday. I wanted to better organize the various ’songs’ we’ve put together over the past few months. Our music room (formerly the cats’ dining room) doesn’t pick up the house wi-fi so I just made things up as I went along. Yves’ session data is more fine-grained than what I was after for this job, but I’m pretty sure with a bit of tweaking something consistent is possible.

Here’s a sample of what I came up with:


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix : <http://purl.org/stuff/studio/> .

[ a :ProgressReport;
  dc:date "2008-03-18";
    :subject
  [ a :WorkInProgress;
      :shortName "gloriaok" ;
      dc:title "Gloria" ;
      :origin [ rdfs:label "Cover" ];
      :style "blues rock";
      :currentState "lots down";
      :nextAction "redo vocals";
      :nextAction "mix bass"
  ]
] .

As well as resolving the Music Ontology overlap, I’d also like to align this with my general-purpose Project Vocabulary so that not only will it keep things better organised (I had a few out-of-sync variations of the same tune) whenever I finally get around to building the GTD tools it’ll help me decide what to do next.

Shorter term, sticking the stuff in a store with a SPARQL endpoint would make a handy reference. Right away it seemed there were a couple of opportunities for automation - several of the

:nextAction

values were “archive”, a lot were “delete”. A simple script should be able to take care of those.

Application Idea : Music/Audio Equipment Lists

There’s an application ideas page on the n2 Wiki, things that the Platform would be well suited to supporting. I just thought of another one, and can’t see any reason not to increase the LazyWeb angle by posting here too.

Ok, so there are lots of sound gearheads around. They love talking about their setup - whether just a guitar & amp or a fully-fledged recording studio or live rig. The application would provide an easy way of listing their kit and sharing those lists. On its own it wouldn’t be very interesting (e.g.), but if you allow for rich annotation and interlinking to things like equipment specs, manufacturers pages, bands and especially user comments and reviews, it could be really useful. Everyone’s always on the lookout for technical tips, new toys etc.

This came to mind tonight when I wanted to mic up a tambourine - what kind of mic to use? It took quite a bit of Googling and sifting through barely-related material, then diving into old-fashioned forum threads to find that condenser mics were best for the job, but in a pinch you could get away with a bog-standard dynamic SM57/SM58. My large-diaphragm condenser mic does sound good but is a hassle to set up and is really sensitive to background noise. But I found that simultaneously using a little cheapo condenser measuring mic (T.Bone MM-1) on one channel and my glorious Beta 57A on another gave very good results.

Many of the necessary vocabularies are already around: Music Ontology plus lots of Music Ontology plus lots of SKOS, Review vocab, FOAF, DC. I think it’d probably need a part-whole vocab (a Les Paul could have say Bare Knuckle pickups - nice tone!) plus something FRBR-esque for products (my particular glorious Beta 57A versus glorious Beta 57As in general).

A different bunch of terms would allow for hi-fi audiophile material, photographic/video equipment, custom cars/bikes…pretty much any activity that’s equipment-heavy and has a nerdy fan base (even computers).

See also: Harmony Central.

Noodling with Atom/RDF

Now that GRDDL’s a Recommendation, it’s about time we started using it. One particular bit of (potentially) low-hanging fruit is Atom (RFC 4287) - cleanly specified XML, well deployed for bloggish content syndication and increasingly having interesting extensions shoehorned in.

Anyhow, more on that some other time. I finally got around to trying a long-standing item on my to-do list: RDFize the wonderful Planet Venus aggregator. I reckon a persistent, queryable store of interesting subscriptions is a must-have part of any respectable personal knowledgebase. I haven’t time right now to go into detail on how it works (and in it’s current form you probably wouldn’t want to know), but basically these minimal Python scripts transform Venus’s Atom cache into RDF/XML and post the result of to a Talis Platform store (after first checking the entry isn’t already in the store). So far I’ve got it working enough to make some data available for SPARQling.
If you go to this SPARQL Query form, select the “twitcrit reviews” endpoint with the dropdown and enter a query like this:

PREFIX ar: <http://djpowell.net/schemas/atomrdf/0.3/>

SELECT DISTINCT ?entry ?tp ?title ?cp ?content
WHERE {
[
a ar:EntryInstance ;
ar:entry ?entry;
ar:title [ ?tp ?title ] ;
ar:content [ ?cp ?content ]
]
}
LIMIT 10


- you should see some results.


Next steps are to set up some local caching (thinking of just keeping a list of cache filenames) and turning it over to use the Changeset Protocol rather than the basic unversioned model posts it’s doing now. Once those are in place I’ll make a cron job for it.

There are quite a few different atom2rdf XSLT’s in circulation, the current best-bet frontrunner being one atom2rdf-18.xsl from David Powell, so I used that. Here’s the Venus install, I just pulled out a bunch of the semweb related feeds from my Bloglines subscriptions (note that I cleared the cache earlier today, there was way too much stuff in it for testing).

jQuery.Talis

jQuery.Talis is a plugin for the popular javascript library jQuery. It acts as a wrapper around the talis convert service, for retrieving json, through jsonp, from the Platform.

You can read about it on the n2 wiki and download it from the n2 svn.

You can use it like this:


$.Talis.Store('schema-cache').sparql('DESCRIBE ',
      function(data){
        $("#Person h1").html(data['http://xmlns.com/foaf/0.1/Person']['http://www.w3.org/2000/01/rdf-schema#'][0].value);
});

(This would fetch a description of the foaf:Person class from the http://api.talis.com/stores/schema-cache store, and insert the rdfs:label into the DOM.)

I don’t want to declare this stable yet, but it is usable in it’s current form (I use it in the SIOC Comments Widget). The size currently comes in at ~4k without any compression or minification. So far, it’s only been tested in Firefox, Safari and Opera, so reports of cross-browser problems, and any other bugs, would be appreciated.