Subscribe

GRDDLing DeWitt’s Friends

DeWitt Clinton has a great write-up of Creating a HTML “friends” page from a Google Reader subscription list, a bit of hackery which leads to a hCard microformat-enriched friends list. A little tweak to the HTML can make it more machine-friendly, just adding a HTML Meta Data profile URI:

<head profile="http://www.w3.org/2006/03/hcard">

That profile is GRDDL-enabled, so any GRDDL-aware agent can interpret the source document as RDF. This part’s easy to demonstrate, thanks the online W3C GRDDL service. So I’ve put a tweaked version of the HTML online, and here’s DeWitt’s friends page as RDF (in Turtle syntax, rendered a little verbosely).

Having set this up I realised the data wasn’t actually expressing the friend relationship, so went on to put together some SPARQL to sort that out - below. But afterwards I realised that DeWitt’s HTML was actually expressing the relationships using XFN class names, but again without the profile URI to make it machine-friendly. So another tweak:

<head profile="http://www.w3.org/2006/03/hcard http://www.w3.org/2003/g/td/xfn-workalike">

- the corresponding service output (scroll down to see the extra bits). I suppose I should mention that you can have as many space-separate profiles as you like, and the GRDDL-aware agent will interpret them independently, just accumulating all the triples. The second profile URI adds xfn:friend relationships, I think it would have been more useful with foaf:knows as well, but it is only a demo.One of these days the microformats folks might get around to tweaking the official profile appropriately…

The SPARQL I mentioned looks like this:

prefix rdf:
prefix vcard:
prefix foaf:

CONSTRUCT
{
[ a foaf:Person;
foaf:homepage ;
foaf:name "DeWitt Clinton" ;
]
foaf:knows
[ a foaf:Person;
foaf:homepage ?homepage ;
foaf:name ?name ] .
}
WHERE
{
[ a vcard:VCard ;
vcard:url ?homepage ;
vcard:fn ?name ]
}

- when applied to DeWitt’s data (as RDF), this will map it across from the vCard vocabulary - finding the appropriate ?variables by matching the pattern in the WHERE clause, inserting those ?variables into the CONSTRUCT clause to produce some new RDF.

I tried this on the Redland SPARQL demo, and I think it’s producing the RDF I wanted. Unfortunately the serialization is really ugly - lots of bnodes, and it’s hard to check visually. It appears to confuse Tabulator too, and the W3C RDF Validator which is handy for this kind of visualization appears to be down. (Here’s a copy of the RDF/XML). Still, it was only a workaround - with the right profiles in place it’s not needed.

I’m not sure if there’s a microformat way of expressing that the source data was a subscription/reading list. To get the richest RDF out it might be easier to do what DeWitt did, but to a full RDF serialization rather than microformatted HTML (which is effectively a CustomRdfDialect), producing something like Planet RDF’s blogroll.

More recording studio RDF

Yves Raimond responded to my post aboutMusic/Audio Equipment Lists with Describing a recording session in RDF. I like it - looks useful.

Coincidentally I found my self doing something closely related yesterday. I wanted to better organize the various ’songs’ we’ve put together over the past few months. Our music room (formerly the cats’ dining room) doesn’t pick up the house wi-fi so I just made things up as I went along. Yves’ session data is more fine-grained than what I was after for this job, but I’m pretty sure with a bit of tweaking something consistent is possible.

Here’s a sample of what I came up with:


@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix : <http://purl.org/stuff/studio/> .

[ a :ProgressReport;
  dc:date "2008-03-18";
    :subject
  [ a :WorkInProgress;
      :shortName "gloriaok" ;
      dc:title "Gloria" ;
      :origin [ rdfs:label "Cover" ];
      :style "blues rock";
      :currentState "lots down";
      :nextAction "redo vocals";
      :nextAction "mix bass"
  ]
] .

As well as resolving the Music Ontology overlap, I’d also like to align this with my general-purpose Project Vocabulary so that not only will it keep things better organised (I had a few out-of-sync variations of the same tune) whenever I finally get around to building the GTD tools it’ll help me decide what to do next.

Shorter term, sticking the stuff in a store with a SPARQL endpoint would make a handy reference. Right away it seemed there were a couple of opportunities for automation - several of the

:nextAction

values were “archive”, a lot were “delete”. A simple script should be able to take care of those.

Noodling with Atom/RDF

Now that GRDDL’s a Recommendation, it’s about time we started using it. One particular bit of (potentially) low-hanging fruit is Atom (RFC 4287) - cleanly specified XML, well deployed for bloggish content syndication and increasingly having interesting extensions shoehorned in.

Anyhow, more on that some other time. I finally got around to trying a long-standing item on my to-do list: RDFize the wonderful Planet Venus aggregator. I reckon a persistent, queryable store of interesting subscriptions is a must-have part of any respectable personal knowledgebase. I haven’t time right now to go into detail on how it works (and in it’s current form you probably wouldn’t want to know), but basically these minimal Python scripts transform Venus’s Atom cache into RDF/XML and post the result of to a Talis Platform store (after first checking the entry isn’t already in the store). So far I’ve got it working enough to make some data available for SPARQling.
If you go to this SPARQL Query form, select the “twitcrit reviews” endpoint with the dropdown and enter a query like this:

PREFIX ar: <http://djpowell.net/schemas/atomrdf/0.3/>

SELECT DISTINCT ?entry ?tp ?title ?cp ?content
WHERE {
[
a ar:EntryInstance ;
ar:entry ?entry;
ar:title [ ?tp ?title ] ;
ar:content [ ?cp ?content ]
]
}
LIMIT 10


- you should see some results.


Next steps are to set up some local caching (thinking of just keeping a list of cache filenames) and turning it over to use the Changeset Protocol rather than the basic unversioned model posts it’s doing now. Once those are in place I’ll make a cron job for it.

There are quite a few different atom2rdf XSLT’s in circulation, the current best-bet frontrunner being one atom2rdf-18.xsl from David Powell, so I used that. Here’s the Venus install, I just pulled out a bunch of the semweb related feeds from my Bloglines subscriptions (note that I cleared the cache earlier today, there was way too much stuff in it for testing).

Drupal and the opportunity of RDF

At the start of this week, Dries Buytaert presented the keynote presentation at DrupalCon 2008 . The most exciting revelation came at the end: Drupal’s future is in the semantic web..

While Dries talks about the semantic web, and RDF, you don’t hear much reaction from the crowd; but then he says Let me show you a video of the future And proceeds to demonstrate SPARQLing on linked data from sources like dbpedia dbtunes, geodata, events, friends lists, and google spreadsheets, mashed-up in Exhibit.

This gets a lot of applause :)

In the keynote, he puts emphasis on data interoperability, decentralisation, remote querying, and how having a lot of data is great fun :)

It’s a really great talk, with a lot of excellent quotes about the value of RDF for Drupal, here are some of my favourites:

Web 3.0 (much as I hate to use the term) is all about infinite interoperability

We have the opportunity to be mentioned in the history books of the web … This is where the web is going. And this right time, and the right place, to make it happen.

Using RDF you can connect all these different parts of data, that live in different parts of the web.

RDF turns the web into a database

The real opportunity we have here is to start sprinkling this map [of linked open data sources] with Drupal. Every single Drupal site can be an RDF repository that people can query

Google are trying to build a world social graph, connecting people … but what we are doing with RDF is connecting not just people, but everything

With RDF, the import/export problem we have in Drupal just goes away. It just works, without having to describe database schemas… It just works. It’s a problem that is already solved.

You can listen to the audio of the presentation at archive.org (~45MB - the RDF stuff starts at around 53 minutes), and view a video of the RDF demonstration

You can also read more about Drupal and RDF here

Styles of Web Application - FlowPHP

Ian blogged a while back about why MVC is a rubbish pattern for web development because it doesn’t describe the problem in a way that helps you understand it better. I completely agree, and it’s surprising how much “received wisdom” there is about MVC being the right way to do things, but the natural response is, Well, what isn’t a rubbish pattern then?

Someone asks that in the comments on the blog post, and Ian replies:

Doesn’t REST define the pattern you need: resource/representation? Your application uses the URI to locate the appropriate resource and asks it to produce the appropriate representation.

I’m not completely happy with that as an answer though. To me, REST defines the interface to your application, and while it helps define at least that part of the problem, it doesn’t really give you enough of a solution. It doesn’t help you decide how to structure your code in the same way that MVC does (even if that decision is ultimately suboptimal).

I’ve been writing web apps in a similar style to that used by RESTful frameworks like Tonic and web.py, which I guess could be described as what rsinger called “_VC” on #talis the other day. Basically you have different ‘Resource’ classes that map to your application’s url design and return representations when, eg, a GET, or a POST method is called on them. A great boon of developing with RDF is that, because all data is the same shape, you can do things pretty generically, and write less domain-specific code. So I tried to keep my resource classes as generic as possible, and have different url routes set up the classes with different parameters as need be.

However, I’ve been growing pretty dissatisfied with this way of doing it, because it still seemed to be obscuring too much of the problem for me conceptually. There was still a problem of, ‘OK, where is the best place to put this‘, and a constant tension between whether to try to extend a generic class to cope with another situation, or writing a new one to do what you want. So I’d end up with a lot of classes that did a lot of pretty similar things (retrieving SPARQL queries, parsing them, passing data to the template), but not similar enough to be able just to do it with one class. I also found that class inheritance was a slightly messy way to share functionality, and it could be annoying to try to remember which class was used for which url space, and look it up in the routing configuration, and it wasn’t very amenable to serving representations derived from a combination of data sources.

So the other day I had an idea for a different style, which I’m pretentiously code-naming ‘FlowPHP’ (pronounced floaf - the P is silent ;) ).

The motivation is to try to model the process of receiving a request and returning a response as a chain of modular bits of code that create a response from the incoming request, and filter it until it is served. I’ve been trying this idea out, and so far, it looks like this:


try{
$KwijiboDev1 = new Store('http://api.talis.com/stores/kwijibo-dev1');
$R = new Request(array('SERVER' => $_SERVER, 'GET' => $_GET));
switch(true):
	case $R->is('GET','/posts'): // method is GET and url is /posts
		$R->response()->
                        checkCache()->
                            RDFList($KwijiboDev1, SIOC.'Post')->
                                SmushGraph()->
                                   serve('posts','main');
		break;
	case $R->is('GET','/post', array('uri')):
		 $R->response()->
                            checkCache()->
                                 CBD($KwijiboDev1, $R->GET['uri'])->
                                    serve('post','main');
		break;
	default:
		throw new HTTP_404("Page could not be found");
endswitch;
}
catch (Exception $e){
	echo $e->serve('error','main');
}

So what this is doing, is:

  • building a Request object with data from the $_SERVER and $_GET variables.
  • Checking the HTTP REQUEST METHOD, the REQUEST URI, and (optionally) for the existence of any required parameters.
  • Processing the Request and serving a Response by:
    1. Checking for a cached version we could serve first
    2. Retrieving the data: eg, CBD
    3. processing the data (eg: SmushGraph)
    4. Serving it in templates (serve() takes a variable length list of templates as parameters, rendering each inside the next template in the list)
  • Responding with an appropriate error if necessary (eg, HTTP 404, 405, 406, 500 - I pinched the idea of modelling 4xx and 5xx as Exceptions from Konstrukt)

Each ‘method’ in the chain, up until ’serve()’, is returning the altered response object for the next method to manipulate. The methods that deal with adding data to the response, doing stuff with data, etc, aren’t really methods at all, but dynamically-called functions from a separate file. The reason I did it like this is I think it might be more modular and extensible, whilst not necessitating the creation of lots of different subclasses of Response.

This is still all evolving of course, and some/all of the ideas might turn out to be rubbish, but the thing I’m liking so far is the transparency: I think it’s relatively easy to see what’s going on with the code - what happens where, and when. The thing I’m experimenting with, I suppose, is the level of abstraction - my previous approach was perhaps too high-level and inflexible, which resulted in either lots of code, or lots of configuration, and the routing was kept too separate from the logic of returning the response.

The particular tension I’m finding with trying to develop flowphp at the moment, is to find a good idiom for setting variables midway through the chain of events - I’m loathe to have to break out of the chained methods, but maybe that’s only for aesthetic reasons.

Editing the Web of Data

Another feature of the experimental Convert service is the ability to pull RDF (extracted from RDF/XML, turtle, RDF/JSON, and HTML) into an editing interface - either form fields, or editing free text as Turtle, edit the data, transform it (options include describing the edit as a ChangeSet, reifying, or dereifying the data), convert it, and POST or PUT the results to any URI.

This might not seem so exciting if you are already quite happy doing this on the commandline with Vim and cURL, but what is potentially interesting about it is the syntax used in the name attributes of the form widgets to roundtrip the data from the web, through the HTML forms ( you can read about the forms syntax on the n2 wiki ).

Javascript and JSONP

What this means, is that the Convert service can be used as a proxy for purely client side javascript applications. You can retrieve RDF data form anywhere on the web by requesting a JSONP conversion of it from the service.

JSONP

The JSONP service allows you to specify a callback function, and it will return the data wrapped in a call to that function.

So you define a function called ,eg, my_callback which accepts the json data object as the first parameter, and then create a script tag pointing to the JSON data, with a url parameter of callback=my_callback. The browser will then load the remote javascript into your page, which will call your my_callback function, passing it the data you requested.

RDF in HTML forms

You can then load the data into HTML form widgets, using the same forms syntax as the Convert service’s editor page for the @name attribute, and point the @action attribute of the form at the Convert service. Pass in the appropriate form values describing how you want the data transformed and converted, and where you want to POST or PUT it to. Then when the form is submitted, the POST array will be transformed and converted into the format you chose, and forwarded on to the URI you chose.

The upshot of which, I think, is that you can write pure client-side applications that read, write, and edit data across the web.

It’s kind of like that formmail.pl script, for RDF ;)

Experimental Convert Service

Lately I’ve been working on an experimental Convert service. The idea is much like dajobe’s triplr or Simile’s babel - accept a variety of semantic formats as input, and make them available in other flavours as output.

RDF -> RDF

The service accepts HTML (preferrably with eRDF, RDF, or microformats), RDF/XML, turtle, or RDF/JSON as input, outputting to a variety of RDF serialisations. For the parsing of most of these RDF formats, the service uses Benjamin Nowack’s excellent ARC library for PHP.

SPARQL/XML and Facet/XML

The service also accepts SPARQL/XML and the XML from the Talis Platform Facet service, transforming to either JSON, JSONP, or HTML.

Doing the conversions is a PHP library, available in the n2 SVN repository