Subscribe

Author Archive

Store Admin Interface

If you have a Talis store, or even if you’re just interested in browsing around existing talis stores, you might be interested in an admin interface  I’ve been working on.

Once you have selected a store, you can browse resources by type (rdf:type), search across the contentbox index, edit resources, view pending jobs and send new ones, import data, and configure the field-predicate mapping for your stores.

Please send bug reports and feature requests to keith dot alexander at talis.com

If you do want a talis store, just ask in #talis on irc.freenode.net, or email danny dot ayers  at talis.com

Batch Changesets ARC Plugin

Platform Release 12 included a very useful new feature: the ability to send more than one changeset in a single POST to your store.

To generate a batch changeset from 2 versions of an RDF graph, you can use an ARC plugin called Talis_ChangeSetBuilderPlugin.

To use it:


	  $args = array(
			'before' => $before, //can be rdf/xml, turtle, or an ARC simpleIndex array
			'after' => $after,  //can be rdf/xml, turtle, or an ARC simpleIndex array
		);
		$cs = ARC2::getComponent('Talis_ChangeSetBuilderPlugin', $args);
		$cs_response = $store->get_metabox()->apply_versioned_changeset($cs); 

The plugin also relies upon the IndexUtils Plugin. The easiest way to get them all set up is to change to your arc directory and do:


svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/ plugins

Rollbacks in Moriarty

Editing resources in the metabox of Talis Platform stores is done with Changesets. If you choose to use the versioned changesets API, your changesets will be stored as data in the metabox as well.

The great practical benefit of doing this is you can then reverse previous ChangeSets to return a resource to its previous state. You can read about one way to reverse changesets on the wiki. You can also now create rollback changesets from Moriarty with the new Rollback class.

To use it:


define('MORIARTY_ARC_DIR', 'arc/');
require 'moriarty/store.class.php';
require 'moriarty/rollback.class.php';  

//create a store object
$store = new Store('http://api.talis.com/stores/my_store');  

//Instantiate the Rollback class with a sparql service object:
$sparql = $store->get_sparql_service();
$rollback = new Rollback($sparql);  

//Call the to_changeset method, with a changeset's uri as the argument
$HTTP_Response = $rollback->to_changeset('http://api.talis.com/stores/my_store/items/1200302910905#self');  

// the body of the response is the changeset you need to revert back to the
// state of the resource before the changeset that you have given the URI of  

if($HTTP_Response->is_success()){  

//submit changeset  

	$rollbackResponse =  $store->apply_versioned_changeset($HTTP_Response->body);  

	if($rollbackResponse->is_success()){
		//relax!
	}else{
		// throw an error
	}  

}  

Streaming Parsing of RDF/XML with ARC2

A common trouble when parsing RDF is running out of memory because the document is too large. ARC2 solves this problem (for RDF/XML) by being able to stream it.

If you want to take advantage of the streaming, you just need to extend the ARC2_RDFXMLParser class and overwrite the addT method:

<?php
require 'arc/ARC2.php';
require 'arc/parsers/ARC2_RDFXMLParser.php'; 

class Streamer extends ARC2_RDFXMLParser { 

	function addT($s, $p, $o, $s_type, $o_type, $o_dt = '', $o_lang = ''){
		var_dump($s, $p, $o, $s_type, $o_type, $o_dt, $o_lang);
	} 

} 

$p = new Streamer(); 

$p->parse('big-data.rdf'); 

?>

In this simple example, I’m just var_dumping out the triples as they come in, but of course you should do whatever it is you want to do instead to the triple in that method.

European Semantic Web Conference 2008 @ Tenerife

Last week Tom and I were in Tenerife for the European Semantic Web Conference, where he was chairing a session, and I was presenting a short paper on RDF/JSON, both at the Semantic Scripting Workshop.

Scripting Workshop

The scripting workshop itself was excellent - I enjoyed all of the others’ papers a lot, and look forward to playing with the code, ideas, and applications they presented.

The paper just before me was about using RDFa and javascript to allow in-page editing of resources. One particularly nice thing about it was that they implemented our RDF/JSON specification in their API (cheers guys! ;) ).

The scripting challenge was won by the highly deserving but sadly absent Benjamin Nowack with SPARQLBot - his IRC bot that can read and answer questions from RDF data sources. The second prize was won by Alexandre Passant and co for their Semantic Microblogging system, which looks very interesting indeed.

SPARQL

There were quite a few interesting papers on SPARQL - extending it in various ways, or extending SPARQL into other technologies. There were two papers on using SPARQL to bridge the XML and RDF divide: one on embedding SPARQL in XSLT extension functions (’enabling the developer to combine XSLT and RDF in a way that doesn’t suck’) ; another on combining XQuery and SPARQL.

One paper I thought was especially interesting was about extending SPARQL to work on streams of data.

The best paper award was on SPARQL, and won by Christoph Kiefer, Abraham Bernstein, and André Locher for “Adding Data Mining Support to SPARQL via Statistical Relational Learning Methods”.

Vocabularies and Ontologies

Richard Cyganiak presented Neologism - an open source drupal-based web application for creating and publishing vocabularies. What is great about this application is not its features, but its philosophy. While desktop applications like Protégé may provide lots of features for designing, creating and reasoning with ontologies, they don’t help with the publishing of them - which can be a rather tricky issue. The idea behind Neologism is to make it easy for vocabulary authors to do the right thing, and author and publish their vocabularies according to best practice - an aim I really applaud.

I also attended a tutorial on developing ontologies with the use of patterns - starting by reusing some basic modeling patterns, which could be used as a mold and later discarded when the design was complete. The tutorial incorporated this into an XP system of development, involving test-driven ontology design - which I thought was an interesting idea.

voiD

Michael Hausenblas presenting voiD

One thing from the conference I particularly enjoyed was Michael Hausenblas, Richard Cyganiak, Jun Zhao, and I developing Michael’s idea of metaLOD into a vocabulary for describing datasets like those in Richard’s famous LOD diagram (see the slide in the photo above).

The idea was to come up with a light-weight vocabulary that would enable RDF descriptions of interlinked datasets; these descriptions (and so the various access points to the datasets) can be made discoverable via the Semantic Sitemaps extension, and aggregated via services like Sindice. I have a practical interest in this as well, since with the Platform (and our work on the Open Data License), we want to enable people to publish lots of datasets on the web. We already have lots of interesting datasets in the platform (which I have been doing some work on describing in the silkworm-dev store), and we are really keen to make our publicly available datasets discoverable by machines and humans, and available for reuse.

We spent about an hour discussing the weighty issues of scope, vocabulary reuse, and ontology modeling patterns - then at least a further hour trying to come up with a suitable acronym. Finally Laurian advised us not to think of an acronym, but of a cool-sounding word that encapsulated what the vocabulary would give to the world. So we came up with voiD: a Vocabulary Of Interlinked Datasets.

(see also Orri’s “VOID, Or will the LOD Cloud Bring Rain”)

Semantic Games

I saw two papers on games and the semantic web at ESWC. The first was Knud Möller’s highly entertaining talk on World of WebCraft - Mashing up World of Warcraft and the Web at the Semantic Scripting Workshop; where he showed how he gleaned semantics from the game by scripting addons, mashing it up with data from dbpedia, and screenshots from flickr.

OntoGame presentation
The second was Katharina Siorpaes‘ presentation of her work on OntoGame, an application of Luis von Ahn’s Games With a Purpose concept to using online multiplayer games for getting people to perform what might otherwise be rather dull tasks in ontology creation and alignment, and data annotation. The idea centers around using blind collaborative game-play to achieve consensus and accuracy on what is common knowledge to humans, yet opaque to machines. I wondered if such game-play would be compelling to mainstream users, but according to the paper’s authors, the social aspect of these games can provide plenty of interest and incentive to keep playing. It’s a thought-provoking concept anyway, and it will be interesting to see what develops in this area, and in which niches these techniques will work best.

Demos + Posters

The demos were really good. From asking the other attendees, I think the favourites were QuiKey, a Quiksilver-like interface for entering and searching through triples (which won best poster); xOperator (a really intriguing combination of jabber, SPARQL, and Agents to bring you trusted answers to questions via Instant Messaging); OntoGame (described above); and Konduit, a Semantic Pipes-like application for visual programming for the Semantic Desktop (which made me reconsider my position in an irc discussion with iand about whether to describe application flow in code or data; it also won best demo).

Lightning Talks

The lightning talks, were, as ever a popular and light-hearted, yet thought-provoking event. The format was a tight 2 minutes, 1 slide, which was strictly adhered to: Andraž Tori of Zemanta gave a very good presentation that was roundly and deservedly booed when he tried to slip in 3 slides. At first I thought 1 slide would be a bit limited, but it was actually pretty good - giving each speaker a chance to present only one single idea at a time. All the talks were entertaining, but some that stood out for me were:

  • Jenny Green from the Ordnance Survey explaining that, in one database, they currently held enough data to overwhelm any triple-store in existence, and would need a large server-farm to store and serve it all.
  • Laurian Gridinoc advocating the use of RDFa

    Laurian
  • Andrew Green explaining How He Learned to Relax and Love the Bnodes: use them when you only need a ‘glue’ node that isn’t a ‘thing’ in its own right, and doesn’t deserve an identifier (I’m still not convinced: bnodes, bah!).
    ESWC2008 Lightning Talks - Dr. Strange Semantic Unit

The only great pity was that the lightning talks were run in parallel with other tracks, so I missed out on the start of the Applications track. Next year, hopefully they will run the lightning talks separately from the rest, and record them for posterity (the other talks had video-cameras in attendance).

Industry

ESWC is a pretty academic conference, but it was really interesting to meet people from other companies making great use of semantic technologies to their competitive advantage, like ProKarriere, an Austrian online recruitment service that uses tools like Crowbar and Solvent to scrape semantics from partner web-sites, together with Natural Language Processing, and ontologies they have developed, to intelligently match graduates up with appropriate vacancies. Or like Net7, who have built Talia, a semantically backed digital library for Philosophy Scholars (described in their Scripting Workshop paper). Or like Garlik, who had a whole keynote about them.

Interdisciplinary

A theme that appealed to me was using semantic technologies outside of Computer Science departments to aid scholars in other fields - there was quite strong presence from Finland with their work in the cultural heritage sector, visualising time and space. I also really enjoyed seeing Jun Zhao’s presentation about using off-the-shelf semantic software like Exhibit to help zoologists navigate a repository of research images, the Net7 guy’s demonstration of their Digital Philosophy library Talia, and hearing about Norman Gray’s work using RDF with astronomers.

While the quality of the presentations was really high, the best bits, as usual, were the socializing and informal discussions in between, meeting names I’d long been familiar with from the various semweb mailing lists, blogs and irc channels (#swig, #swhack, #sioc etc), and new people besides.

SWIG-Scotland

It was also nice to meet a few other people living in Scotland doing semweb stuff - there doesn’t seem to be that many of us. So I set up http://groups.google.com/group/swig-scotland in the hope that we can all arrange to meet up some time and talk triples (please join if it’s of any interest).

Looking over my copy of the proceedings, I realise that there’s so much stuff I didn’t see that I would have liked to (the tragedy of parallel sessions), and so much stuff I did see that I haven’t done justice to here - all the Nepomuk semantic desktop stuff for instance, or DERI’s research into sensors connected to the web, or the Vapour tool for testing HTTP conneg, or … but I have to stop now :). Suffice to say, it was great, and I’ve got a lot to think about and try out over the coming weeks.

Tutorial: jQuery and the Talis platform

We will use the jQuery.Talis plugin to create a simple html+js interface to a talis store.

the Talis plugin is a small wrapper that simplifies retrieving json from the platform remotely (via jsonp). It allows you to query the platform, and specify callback functions for dealing with the retrieved data.

We’ll have a text box to type a search string into; this will retrieve results (of matching resource descriptions) from the platform store, and display them in a list of links. Clicking on the links will display the resource description.

1. The HTML:

We are going to need three elements for this:

  1. A text input for typing the search strings into:
    <label for="search">Search<input type="text/submit/hidden/button" name="search" id="search"/></label>
  2. A list to insert the search results into.
        <ol id="results"></ol>

    and:

  3. A div to display the resource descriptions in:
        <div id="description"></div>

2 The Javascript

At the command line, switch to the directory you saved your HTML file in, and do:

    svn co http://n2.talis.com/svn/playground/kwijibo/js/Talis.jQuery.plugin/trunk/ js/

Now we link to the javascript files from the bottom of the <body> of our html page:

<script type="text/javascript" charset="utf-8" src="js/jquery.js" mce_src="js/jquery.js"></script>
<script type="text/javascript" charset="utf-8" src="js/Talis.jQuery.js" mce_src="js/Talis.jQuery.js"></script>
<script type="text/javascript" charset="utf-8" src="js/jsRDF.jquery.js" mce_src="js/jsRDF.jquery.js"></script>

(jsRDF.jQuery.js is just a small, nascent library for manipulating RDF/JSON )

Now open another script tag, and we’ll write some javascript to connect our html with the platform:

First, we’ll declare some variables we’ll want to use:

var RSS_ITEM = 'http://purl.org/rss/1.0/item';
var RDF_TYPE = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type';
var MY_STORE = 'schema-cache';

For this tutorial, I’m using the schema-cache store, which contains many RDF and OWL vocabularies.

Now, what we want is to query the platform when we type in the text box, so:

$("#search").keyup(function(){
    var query = $("search").val();
    $.Talis.Store(MY_STORE).items(query, function(json){
        /*  we do something with the json data from the platform in here ... */
    });
});

What’s happening here, is we are taking the text that has been typed in the textbox (#search), and querying the items service of our store with it. The second parameter of the Store.items method is a callback function, in which you can specify what to do with the data when it is retrieved.

The platform items service returns the results in an RSS feed, which the jQuery.Talis plugin fetches for us in rdf/json, however, for this, we only want the items of the feed, not the RSS feed resource itself, so we need to filter in only the resources that have rss:Item as a value of their rdf:type property:

    var RDF = $.jsRDF(json);
    var rss_items = RDF.filter({p:RDF_TYPE, o:{value:RSS_ITEM}});

Here, we are loading the data into a jsRDF object, which has methods for manipulating it. We’re using the filter method to select the resources that have an rdf:type of rss:Item. Now we want to render them in the page inside our #results list:

$.each(rss_items, function(uri, properties){
    $("#results").append('<li><a href="'+uri+'" mce_href="'+uri+'">'+RDF.get_label(uri)+'</a></li>');
});

OK. We want clicking on those links to show the resource description, so we’ll define a function for retrieving that description from the store, and rendering it, then we’ll bind it the onclick event of the links in the results list:

function browse(uri){
    var uri = this.href;
    $.Talis.Store(MY_STORE).lcbd(uri, function(data){   

        var RDF = new $.jsRDF(data);
        $('#description').html( RDF.to_html(uri) );
        $('#description dd a').click(browse);
    });
    return false;
}

We get the URI of the resource from the @href attribute (which we set when we were rendering the search results), then we call the lcbd method on our store (LCBD is short for labelled concise bounded description, and returns the properties of the resource, and labels for all the resources our description references). Again, we use the $.jsRDF object to render the description as html (it uses a definition list for rendering the properties of the resource).

After we’ve rendered the description in the #description div, we also bind the click event on the links to the resource’s properties to the browse function, so that clicking on those links will retrieve and render the resources being linked to.

And that’s pretty much it.

Talis Store Plugin for ARC

The PHP coders amongst you may be interested in a Talis Store Plugin. To install it:

cd arc/plugins #yoru ARC plugins directory

svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/ talis
svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/ARC2_SPARQLSerializerPlugin/ARC2_SPARQLSerializerPlugin.php ARC2_SPARQLSerializerPlugin.php

Then to use it:

require_once '../ARC2.php';   

/* configuration */
$talis_config = array(
  // 'db_user' => 'your_username',
  // 'db_pwd' => 'your_password',
  'store_name' => 'kwijibo-dev3', // your store name
   'fetch_graphs' => false, // If set to true, using FROM will fetch the graph as a datasource over the web, and store it in /meta
);
$store = ARC2::getComponent('Talis_StorePlugin', $talis_config);
$store->query("LOAD ")

What this does is let you use a Talis store instead of the ARC mysql store. It supports a subset of ARC’s SPARQL+ functionality. Specifically, it supports INSERT and DELETE (which I could translate to Changesets thanks to Benji’s SPARQL parser), but not the aggregate functions (which I don’t see a way to support in a client-layer at this point).

Some differences:

Named Graphs are currently a bit different in Talis stores - you can’t (yet) create your own on the fly as you can with ARC, so LOAD will put the data into the public graph by default.

Talis platform transforms bnodes into URIs, so .

I also added a few methods to the api:

$store->import($arc_store);
$store->export($arc_store);

(The idea is that you can move data between an ARC store and a Talis store).

I also added a $store->change($before_rdf, $after_rdf) method for submitting changes to an RDF graph.

It’s quite interesting comparing the two different ways of making changes (changesets and SPARQL+). I think that changesets (especially with the coming Batch Changeset support) are maybe a bit more amenable to programmatic resource updates from forms and the like. However, changesets are a bit verbose to hand-write for making quick edits and testing stuff, or pattern-based changes, and I’m finding SPARQL+ really handy for stuff like this.

What I’ve been thinking would be pretty neat would be if the SPARQL parser could be a bit more user extensible, and pre-query hooks could be set up (like ARC’s triggers, which happen post-query), so that plugin/hook writers could extend the SPARQL functionality, or just do stuff pre-query. Use cases might include:

  • rewriting SPARQL for performance improvement, or access control
  • pre-fetching data from FROM graphs over the web and adding it to the store (you can set a ‘fetch_graphs’=> true parameter in the config array you set up the talis store with, and it will do this)
  • adding versioned changesets to the ARC store
  • inventing new keywords - eg: ABOUT <http://example.org/foo> could be rewritten to DESCRIBE ?s WHERE {{ ?s rdf:subject <http://example.org/foo> } UNION {?s cs:subjectOfChange <http://example.org/foo> } } - Similarly you could add syntactic support for rollbacks, transactions, updates

You can see more usage examples at: http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/Talis_StorePlugin.demo.php

Drupal and the opportunity of RDF

At the start of this week, Dries Buytaert presented the keynote presentation at DrupalCon 2008 . The most exciting revelation came at the end: Drupal’s future is in the semantic web..

While Dries talks about the semantic web, and RDF, you don’t hear much reaction from the crowd; but then he says Let me show you a video of the future And proceeds to demonstrate SPARQLing on linked data from sources like dbpedia dbtunes, geodata, events, friends lists, and google spreadsheets, mashed-up in Exhibit.

This gets a lot of applause :)

In the keynote, he puts emphasis on data interoperability, decentralisation, remote querying, and how having a lot of data is great fun :)

It’s a really great talk, with a lot of excellent quotes about the value of RDF for Drupal, here are some of my favourites:

Web 3.0 (much as I hate to use the term) is all about infinite interoperability

We have the opportunity to be mentioned in the history books of the web … This is where the web is going. And this right time, and the right place, to make it happen.

Using RDF you can connect all these different parts of data, that live in different parts of the web.

RDF turns the web into a database

The real opportunity we have here is to start sprinkling this map [of linked open data sources] with Drupal. Every single Drupal site can be an RDF repository that people can query

Google are trying to build a world social graph, connecting people … but what we are doing with RDF is connecting not just people, but everything

With RDF, the import/export problem we have in Drupal just goes away. It just works, without having to describe database schemas… It just works. It’s a problem that is already solved.

You can listen to the audio of the presentation at archive.org (~45MB - the RDF stuff starts at around 53 minutes), and view a video of the RDF demonstration

You can also read more about Drupal and RDF here

ARC2_IndexUtils plugin

ARC2_IndexUtils is a plugin for Arc providing a few simple functions for processing rdf/json - shaped data:

  • ARC2_IndexUtils::filter() takes 2 parameters: a data array, and an associative array of filters. you might use it like this:
    ARC2_IndexUtils::filter($data, array('property'=>  create_function('$u,$p,$os','return $p=="http://xmlns.com/foaf/0.1/name";'), ))
    

    Which would return a data array with only those statements having http://xmlns.com/foaf/0.1/name in the property position.

  • ARC2_IndexUtils::merge takes a variable length list of parameters, where each parameter is an rdf/json style data array, and merges them into one data array.
  • ARC2_IndexUtils::diff takes a variable length list of parameters, returning a data array consisting only of statements from the first array that didn’t exist in any of the subsequent arrays
  • ARC2_IndexUtils::intersect
    takes a variable length list of parameters, returning a data array consisting only of statements from the first array that also exist in all of the subsequent arrays
  • ARC2_IndexUtils::reify reifies an rdf/php data array (you might use this for creating a changeset, or for making provenance statements about your triples)
  • ARC2_IndexUtils::dereify dereifies an rdf/php data array (reified statements can be hard to read, you might want to dereify them to see what they say more easily)

jQuery.Talis

jQuery.Talis is a plugin for the popular javascript library jQuery. It acts as a wrapper around the talis convert service, for retrieving json, through jsonp, from the Platform.

You can read about it on the n2 wiki and download it from the n2 svn.

You can use it like this:


$.Talis.Store('schema-cache').sparql('DESCRIBE ',
      function(data){
        $("#Person h1").html(data['http://xmlns.com/foaf/0.1/Person']['http://www.w3.org/2000/01/rdf-schema#'][0].value);
});

(This would fetch a description of the foaf:Person class from the http://api.talis.com/stores/schema-cache store, and insert the rdfs:label into the DOM.)

I don’t want to declare this stable yet, but it is usable in it’s current form (I use it in the SIOC Comments Widget). The size currently comes in at ~4k without any compression or minification. So far, it’s only been tested in Firefox, Safari and Opera, so reports of cross-browser problems, and any other bugs, would be appreciated.