Subscribe

Archive for the 'Tips and Tricks' Category

SPARQL AJAX Client Library and Example

Over the past few years I’ve tinkered with a number of different implementations of an AJAX client library for SPARQL. Before a standard format for SPARQL JSON results was created, this involved having to jump through the extra hoops of parsing the XML format. But things are much easier now, especially when the JSON support is extended to include the results of CONSTRUCT and DESCRIBE queries.

My personal favourite SPARQL client library though is the one produced by Lee Feigenbaum, Elias Torres, and Wing Yung as part of their work on the SPARQL Calendar Demo.

While the sparql.js library only supports JSON it does have a few convenience features which I like, including global PREFIX bindings and some functions for automatically processing the JSON results to produce some simpler javascript objects (e.g. arrays and hashes) that simplify some scripting tasks and make code more readable.

Using this on the Platform is quite straight-forward, as you can upload this library, and any other related Javascript files directly into the Contentbox of your store. This not only avoids any cross-domain issues, but also means that you can deploy simple AJAX applications directly from a store.

I’ve put together a super simple demo that uses the NASA spaceflight data. The source code is here, and I’ve uploaded the two files into the n2-examples store contentbox, so you can play with the running application.

The demo simply fetches the name, homepage, description and launch date for every spacecraft launched in a particular year, also retrieving a link to a photo if there’s one available. The results are dropped into an HTML table for viewing.

The code is well commented so rather than repeat that here, you can look through the Javascript file that does the actual interaction. I’ve used JQuery to help with the DOM manipulation, etc. This is delivered through the Google JQuery CDN rather than the Platform. But the rest of the application is served directly from the Platform.

A rather easy and trivial example, but sometimes its useful to reiterate the basics. And if you want to incorporate the NASA spaceflight data in your own mashups, then you can do so easily by simple using the version of sparql.js in the space data store.

In my view, SPARQL + JSON + scripting languages like JS and Ruby hit a nice sweet spot for working with RDF, especially with the ability to bring together data from multiple sources using a single standard API.

Note: Keith Alexander has written up some of his own experiments with playing with JQuery against the platform here and here. His JQuery plugin provides some additional Platform specific functionality.

Quick OpenCalais Hack

I’ve been doing some more work on the Ruby client for the Platform recently, and one of my main goals is to provide functionality that makes it easier to copy, merge, interlink and relate together datasets. So far I’ve been concentrating on providing some framework code to make it easier to mash-up data across SPARQL endpoints, but there are many more services that one might want to use when enriching a dataset.

One of those services is OpenCalais. I’ve played with the service on and off, and have previously built a Java client to the service to explore similar functionality using Java and Jena. But as I’m primarily working with Ruby at the moment, I thought I’d look for a Ruby client for Calais. Happily there is one on Github and its available as a Ruby gem.

Documentation is a bit light, and I had to jump through a few hoops to get it working, needing to manually install the curb gem and some native libraries, the following worked for me:

sudo apt-get install libcurl3-dev
sudo gem install curb
sudo gem install calais

With that installed it was a breeze to run a document through the OpenCalais service, and then store the resulting RDF in the Platform:


# Use OpenCalais to find entities in a document specified on the command-line, then store the results
# in a Platform store
#
# Set the following environment variables:
#
# TALIS_USER:: username on Platform
# TALIS_PASS:: password
# TALIS_STORE:: store in which data will be stored
# CALAIS_KEY:: Calais license key
require 'rubygems'
require 'pho'
require 'calais'

store = Pho::Store.new(ENV["TALIS_STORE"], ENV["TALIS_USER"], ENV["TALIS_PASS"])
content = File.new(ARGV[0]).read()
resp = Calais.enlighten( :content => content, :content_type => :text, :license_id => ENV["CALAIS_KEY"])
resp = store.store_data(resp)
puts resp.status

The code is here, and here’s some sample input and sample output.

The code is pretty trivial and error handling is non-existent, but I was pleased with how easy it was to get some data out of OpenCalais and pushed into the Platform. A bit of SPARQL can then be used to do some analysis or further processing of the results

So how do I plan to use this?

As a personal project I’m building out a dataset of NASA space-flight data, this will also include some metadata about astronauts and their roles on each mission. What I want to do is take some documents from the web and then store additional data to state relationships like “Buzz Aldrin is the foaf:primaryTopic of this document”.

The workflow I’m considering is using a Google custom search to give me a high-level index of content, e.g. selecting only the NASA websites. I can then run some representative searches to find documents use OpenCalais to do entity extraction on each result. I can then store the OpenCalais RDF data in the store in a private graph — as I don’t want the raw data in the main dataset — I want to assert triples using my ids and preferred vocabularies.

If the data is in a private graph then I can use the stores’ multisparql service to do some SPARQL queries to match up the resources and CONSTRUCT new triples to store in the public graph.

I’ll post again with some more details on this as I progress, but I thought I’d start out by showing just how simple it is to mashup OpenCalais and the Talis Platform.

Don’t forget, if you want a Platform store to play with for development purposes then drop us a line.

A MalBestPractice with RDF: Making Assumptions

Michael Hausenblas has a new blog post listing some common malpractices when working with RDF.

RDF is a model, not a format

I especially agree with his point about “Thinking of RDF on the serialisation level” (as a malpractice) – grabbing values from RDF/XML or RDFa wih XPath or regexes is not wise. It is making an unsafe assumption about the stability of the serialisation. In fact, if you are writing a Linked Data application, there are very few assumptions you can safely make, about either the serialisation, or the model.

RDF isn’t SQL, XML, OO …

So maybe my favourite MalBestPractising is: trying to treat RDF too much like some other software paradigm – too much like a relational database, too much like OO, too much like XML. It’s enticing to try to write software that treats RDF as if it was something that the mainstream of software development are more familiar with, to try to use the same kind of techniques and shortcuts. But these shortcuts often rely on assumptions that can’t be made about RDF data (at least, not proper, organic, free-range RDF from the web). You can’t assume that the same RDF graph will be serialised the same way as last time. You can’t assume that the http://xmlns.com/foaf/0.1/ namespace will always be bound to the foaf prefix. You can’t assume that a resource will, or won’t have a particular property, just because it has another property, or a particular type. If you don’t know that a statement exists, you can’t assume it doesn’t, only that you don’t know about it. et cetera.

Not making these assumptions can be tedious, and at times problematic, but ultimately, the less assumptions you write into your code, the more interesting, open, and ‘webby’ your application can be.

Less assumption, less code, more data, more web

The huge game-changing thing about web development with the Web of Data though, is not the set of assumptions you can’t make, but the assumptions you don’t have to make . Thanks to the Follow Your Nose principle espoused by Linked Data, you don’t need to write assumptions about your data into your code; you can instead let the application “follow its nose” to find out more about the data.

You can follow vocabulary term URIs to find out how they can be used, how they can be labeled, and what inferences can be drawn from their use. You can follow owl:sameAs and rdfs:seeAlso links to find out more about a resource. You can use semantic index services like Sindice to find occurrences of a URI or keyword across the Web of Data. You can follow dcterms:partOf links from RDF documents back to voiD Datasets, which will often have links you can follow to licenses that tell you how the data can be used, and to other services (such as SPARQL endpoints).

The more data is published, not just within datasets, but about datasets, and about services , the more we can write applications that open up to the web, and the fewer lines of code we will need to do it!

Building a Custom Search Index

The Platform is more than just a triple store with a SPARQL interface. It provides a number of other services which are useful for application developers. The most useful of these is the built-in search engine. Each Platform Store has its own search engine that can be used to perform queries over the hosted metadata. So as well as having the option to query your data using the SPARQL query language, you also have the ability to do simple queries over the data with results being returned as RSS 1.0 (with the OpenSearch extensions). This is a nice feature as sometimes you don’t need the full power of SPARQL and for some use cases a more specialized text indexing system is a better option.

The Platform API allows you to configure the system to build a full-text index over any or all of the RDF literals in your stored data. The exception to this the RDF type predicate, this is the only predicate that will have resource values indexed, making it possible for you to construct a search index and queries that can be used to find matches in specific types of RDF resource.

The remainder of this post shows how to configure the Platform to build a custom search index, with example Ruby code using Pho.

Its common in search engine syntax to use a simple friendly name to identify a specific field that you want to search. For example in a Google search you can use “intitle:Blah” to search for the text “blah” only in the HTML title element of indexed pages. The Platform uses a similar mechanism to allow you to map any RDF property URI to a short friendly name suitable for submitting in a search query.

The complete set of these mappings are referred to as a FieldPredicateMap. The mapping is specific to each store, allowing different stores to have their own mappings. The Platform API exposes these mappings allowing you to retrieve and update the mappings yourself.

It is the presence of a mapping of a property URI to a friendly name that triggers the Platform to start indexing the literal values associated with that property. To put this another way: all you have to do to start indexing your literals is define a mapping. Its a simple as that.

Once a mapping is in place, whenever you submit some RDF/XML to your store, the Platform will automatically index all of the mapped triples. The indexing is done asychronously so there might be a short delay between the deposit of new content and the indexes being updated. Standard stuff.

The Pho Ruby API for the Platform provides programmatic access to this functionality, allowing you to script up the management and creation of the FieldPredicateMap. See the rdocs for the FieldPredicateMap class for details.

Here’s an example Ruby script that illustrates how to manage mappings.

To run the script you’ll first need to fill in the name of your store and your admin username and password. You’ll also need to make sure you’ve installed Pho: gem install pho should do the necessary.

The script does several things. Once a store object has been created, the script creates two new mappings. One for the FOAF name predicate, and one for RDF type:


#create the mappings we want
name = Pho::FieldPredicateMap.create_mapping(store, "http://xmlns.com/foaf/0.1/name", "name")

type = Pho::FieldPredicateMap.create_mapping(store, "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", "type")

The create_mapping method allows you to quickly generate a mapping suitable for adding to a specific store. In order to fetch the current list of mappings the script then does:


#read the existing mappings
mappings = Pho::FieldPredicateMap.read_from_store(store)

#remove anything for this uri
mappings.remove_by_uri("http://xmlns.com/foaf/0.1/name")
mappings.remove_by_uri("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")

#append the new field-name mappings
mappings << name
mappings << type

The read_from_store method does the actual work, doing a GET request to the Platform to retrieve the mappings as JSON, which are then parsed into some useful Ruby objects. The remaining lines then add the newly created mappings to the current collection, after first ensuring that any previous mappings for those URIs have been removed. At this stage we’ve updated our local copy of the mappings but have not yet saved them back to the Platform.

Storing the updated mappings in the Platform is then just a matter of calling the upload method on the mapping object. This serializes the list of mappings as RDF/XML and then PUTs them back to the store. This will overwrite any of the current configuration with the updated copy we’ve got locally: this is one reason why we fetch the current copy before making the changes, to ensure the rest of the configuration is preserved.


resp = mappings.upload(store)
if resp.status_code != 200
  abort("Failed to upload mappings!")
end

The upload method, like many of the lower-level method calls in the Pho library return an HTTP::Message object that you can inspect to determine if the Platform request was successul.

The remaining lines in the sample script simply upload some test data to your store: astronauts.rdf contains a short list of a few astronauts, modelled as simple foaf:Person instances with a foaf:name property. This allows you to test out your newly created search index.

You can now construct item searches with syntax like “name:Buzz” to search for the name Buzz in any foaf:name predicate. Or you can find all foaf:Person instances by performing a search for:

type:"http://xmlns.com/foaf/0.1/Person"

Note that you have to quote the predicate URI. And you can obviously combine those to find only foaf:Person resources with a specific foaf:name.

I’ve run the script against the n2-examples store, so you can use the item search form to test it out. Or just click here to list all the people.

If you peek at the source of the returned RSS feed you’ll find that the essential metadata for each result — in this case the foaf:name property and the rdf:type — is automatically included. Incidentally if you have a FieldPredicateMapping defined with a property name of title then this will automatically be used as the title for the RSS item, allowing you some minor degree of control over the feed structure if you wanted to make it more human-readable.

The Platform provides you with a few more options for managing your search indexes than I’ve covered here. For example the FieldPredicateMap can also be used to associate an Analyzer with the field allowing you to control the indexing rules. You can also control the relevance ranking of the search results through the use of a Query Profile (which is also exposed through an API, and is manageable using Pho). The query profile lets you associate a weighting with a field, so that when a user performs a search without indicating which field they want to search (thereby searching all fields), then the Platform will alter the relevance ranking of the results to suit your preferences.

That concludes our look at the basic steps involved in building a custom search index over the Platform. While the Pho library provides some useful support its worth remembering that its simply a thin veneer over several HTTP operations so achieving the same effects in another language — or even from the console using plain old curl should be easy enough. Hopefully the examples have also illustrated the simplicity of working with the Platform to create some quite powerful features and, importantly, that developing against the Platform doesn’t require your to be a SPARQL wizard: there are other ways to get data out of the system, but the power of SPARQL is there when you need it.

Any questions, then leave a comment and I’ll try to answer them.

Vocabify: Instance Data -> Vocab

One thing about writing RDF vocabularies that occurred to me listening to people talk at VoCamps (Oxford and Galway), is that typically what you are trying to do isn’t defining new terms, it’s modeling data, and at some stage in the modeling you discover you need to write a new vocabulary. Vocabulary authors often want to describe how their terms can best be used with existing complimentary vocabularies, like FOAF and Dublin Core, but the only commonly practiced way of doing so is to put it in human-readable form in the documentation annotations. In voiD, we wrote a guide, principally because we wanted to describe how the terms ought to be used together with existing vocabulary terms.

In tandem with this thought, when sketching out vocabularies myself, I tend not to start out by defining Classes and Properties, which is both tediously repetitive, and a step removed from the data-modeling (which is what I’m actually trying to do in the first place). Instead, I define a prefix for a new namespace, and pretend a vocabulary already exists at it. Probably quite a lot of people do this. I think of them as “pretend schemas“; I’ve heard ldodds call them “just in time schemas” (only bother to write it when someone actually asks to see it).

So last night I coded up Vocabify, which you can feed some instance data that uses your “just in time vocabulary“, tell it which namespace URI is the pretend one, and it will generate a schema from the instance data, which you can then edit and publish.

The classes and properties are also linked to the instances they are generated from with ov:exampleResource, so it is clear to readers how they can be used together with other properties.

Tip: Mirroring a directory of RDF/XML into a Platform Store

When converting data to RDF whether as a result of scraping it from the web; locally analysing a dataset; or simply dumping data from a database. I very often collect the data into a number of RDF/XML files before loading it into a Platform store. Loading the store is de-coupled from the data munging process which typically goes through rapid cycles of development. So I only occasionally publish the data into the Platform to publish it for others to use, or to test within an application.

Whilst writing Pho one of my motivations was to make it easier to support this kind of workflow. For example I’ve made sure that its easy to submit jobs to the Platform to allow a store to be reset before being re-populated. This is as simple as:


store = Pho::Store.new("http://api.talis.com/stores/xyz", "user", "pass")
store.reset()

In the next release I’ll add the necessary support for polling the returned job metadata so that client code can wait until the job is completed before continuing.

But there’s already one useful chunk of code in the form of the RDFCollection class. This provides a simple utility for easily mirroring a directory full of RDF/XML documents into a Platform store. It handles and captures errors; has some support for allowing a load to be resumed if it has to be killed; and checks for new files so it only has to load new content.

Here’s a trivial example of how it works. Lets assume that the directory /example/rdf contains two RDF files: good.rdf and bad.rdf. As the name suggests the second of these files is malformed, so will be rejected by the Platform. If I want to load the contents of the directory into the store I can write:


require 'rubygems'
require 'pho'

#use your own store name and credentials
store = Pho::Store.new("http://api.talis.com/stores/xyz", "user", "pass")

collection = Pho::RDFCollection.new(store, "/example/rdf")
collection.store()
puts collection.summary()

This will POST all the RDF/XML found in the directory and print a simple summary at the end, something like:


/example/rdf contains 2 files: 1 stored, 1 failed, 0 new

The summary indicates that, as expected, one of the files was stored and one was rejected. If you were to look in the (imaginary!) directory you’d now see a couple of extra files: good.ok and bad.fail. The RDFCollection code uses “.ok” files to note which files have been stored and “.fail” files to indicate rejections. The fail files contain a dump of the platform HTTP response. Any files that aren’t accompanied by either of these are considered to be new, i.e. ripe for submission.

If you were to re-run the above script only new files would be resubmitted. If you wanted to re-try failures then you could use:


collection.retry_failures()

While the summary is useful, its more likely that you want to script up a load and then iterate over the list of failures in the directory, perhaps to generate a more complete report. Its easy to do that using:


collection.failures().each do |failed|
...
end

If you want to clear out all of the status tracking files and attempt to resubmit all of the data, then you can just do:


collection.reset()
collection.store()

Obviously there are much slicker ways that the submission status of the files can be tracked and I may well integrate these into later iterations, but I found this to be good enough for a first pass as it supports my initial use case. The technique should also be useful for managing test data when developing against the Platform.

Using Twinkle to SPARQL the Platform

A few years ago I wrote Twinkle, a simple GUI interface for working with SPARQL. While its not the most polished of user interfaces and its in sore need of an update, it’s still serviceable and has been successfully used as a development tool by teams of engineers I’ve worked with in the past.

I gave a short talk on Twinkle at an Oxford SWIG meeting, so you can flick through the slides if you want a quick overview of the functionality. I also moved the code to a google code project to start the process of updating it

Twinkle has the capability to work with a range of different data sources and includes a full SPARQL client, so you can use it to work with any SPARQL endpoint that is accessible from your desktop. Out of the box Twinkle is already configured to work with the Govtrack and DbPedia endpoints, but you can easily add more by changing the configuration.

If you download and unzip the distribution into a directory you should end up with an etc/config.n3 file. This file contains all of the configuration that drives the user interface, including a section that configures remote SPARQL endpoints, e.g:


<http://dbpedia.org/sparql> a sources:Endpoint
    ; sources:defaultGraph "http://dbpedia.org"
    ; rdfs:label "DBpedia.org".

<http://www.rdfabout.com/sparql> a sources:Endpoint
    ; rdfs:label "GovTrack.us".

The above snippet configures two remote endpoints, and applies labels to them so that they appear in the Twinkle UI, under the “Remote Services” section on the left-hand menu. Because some endpoints, such as DbPedia, require to specify a default graph in the SPARQL protocol request, you can also specifiy that in the configuration if necessary.

If you have a Platform Store, or just want to access some data held in the Platform, then you can use Twinkle to perform your SPARQL queries. For example I have a store containing NASA space flight data. The SPARQL endpoint for this store is at:

http://api.talis.com/stores/space/services/sparql

So to register this in Twinkle, I can edit the configuration file and include the following snippet:


<http://api.talis.com/stores/services/sparql> a sources:Endpoint
    ; rdfs:label "NASA Space Data".

Once you’ve restarted the UI you should now be able to click on the Remote “NASA Space Data” service and open up a window into which you can start executing SPARQL queries.

If you’re new to SPARQL, or are interested in playing with the above space data, then you can look over the following slides from a recent SPARQL training session that I ran:


By rob

The slides contain a number of sample queries that should help get you started. Unfortunately some of the diagrams don’t look great in slideshare, but you should be able to download them for a closer look.

Metamorph Open Source project for Semantic Converter Web Service

I’ve published the code behind the Talis Convert Service (production release at stable URL coming soon) as an open source project on Google Code, called Metamorph .

Metamorph is a service aimed at semantic web developers. It is much like triplr, babel, swignition and any23 (please leave a comment pointing to any other similar services).

You give it a(n http) URI, an (optional) input format, and an output format, and it will fetch the document from the web, and convert it into the output format.

Understood input values include:

  • Semantic HTML (RDFa, eRDF, microformats, POSH)
  • RDF (XML, Turtle, JSON)
  • SPARQL-XML
  • Facet XML (the response format of the facets service available on all platform stores)

Output for all input formats can be:

  • JSON
  • JSONP
  • HTML

If the input is some form of RDF, you can also ask for:

  • RDF (XML, Turtle, JSON, – and the default HTML is rendered as RDFa)
  • RSS 1.0
  • TriX
  • Exhibit (web page, JSON, JSONP)

In addition, if the input is an RDF format, you can specify multiple data URIs, and the results will be merged in the output document. For instance, this conversion merges data from two of my homepages, and a Turtle file.

I’m thinking about removing the TriX output, as I’m not sure it would be used by anyone – the reason I didn’t bother to write a parser for it was because I haven’t seen any data published as TriX in the first place.

I welcome any input on what else would be useful from this web service. I suspect that more output options, while fairly easy to add, would not be very useful. More input options may be useful, but perhaps not significantly so.

I suspect what might be more useful, and more likely to distinguish this from similar RDF converter services, are graph transformation services, which might include:

  • Diffs
  • Intersects
  • Smushing
  • Augmenting on property and class type URIs with labels and comments, perhaps retrieved from SchemaCache

Metamorph is coded in PHP, and uses ARC for parsing RDF and HTML, and serialising RDF/XML and Turtle.

Please use the issue tracker for raising any bugs or feature requests.

(Semantic) Web Agents and OSGi

A little fyi/progress report.

For a couple of years now I’ve been mooching around refactoring the intelligent agent paradigm to cover (RESTful) Web services. The kind of intelligence I have in mind is potentially, well, non-existent : a regular Web site could be considered an agent. The motivation is mostly that developing spec-compliant systems on the Web is in general a lot of work, and that this leads to either cutting corners/breaking specs or using frameworks that limit one’s opportunities for innovation. When we introduce Semantic Web technologies into the mix, things get even more difficult.

So what I was after was a simple abstraction of (Semantic) Web systems/services that would allow a lot of the gruesome details of implementation to be hidden away, without breaking the Web. What I came up with looks like this:

An archetypal agent would feature (access from) a HTTP server and a local HTTP client for input & output, a local RDF model for its working memory along with some kind of business logic (behaviour) that would determine what it actually did. (I’m putting on hold one of the usual features of intelligent agents – mobility – though a story on this would be nice for issues like scalability). Agents are effectively self-contained, event-driven components with a common interface (HTTP).

A regular Web site could fit this abstraction in a degenerate form: no HTTP client, content is held in a persistent model, the behaviour is just to deliver that content to any other agents that make appropriate requests (in this case those other agents would typically be browsers, well-known degenerates).

In the past I must confess I’ve tried to express this stuff via MVC, which was a bit of a stretch – I agree with Ian’s view that this isn’t really appropriate for the Web. RMR, ROA or WOA (take your pick!) is a much better fit. Having said that, I’m not sure how much the developer should be operating on the level of resources and representations, they seem more like bricks and cement than architecture – e.g. conneg and httpRange-14 303s should Just Work.

So now (or rather, quite a while ago) I needed a proof of concept system that would allow easy construction of this kind of agent, and I spent a good many free-time hours putting together a little framework. The way I was approaching it (in Java) was for the framework to provide a container for agents, and those agents being aware whether or not they were in the same container. If they were, they could address each other directly, while still supporting HTTP I/O for communications otherwise.

I got quite a long way, despite hitting numerous snags (incorporating asynchronous eventing into the HTTP request/response cycle was a good one). But then as of a few months ago didn’t have much opportunity to look at this stuff.

Fast forward to a few weeks ago. In my todo queue was getting down deep with OpenID and OAuth (which I’m familiar with but haven’t really stress-tested), and it was hard not to imagine using the agent approach to play with these components. Coincidentally I went up to visit Reto in Switzerland and the company he now works for – Trialox who are (amongst other things) building a Semantic Web CMS. While I was up there, Reto gave me an intro to OSGi (formerly the Open Services Gateway initiative) which is essentially a set of specs for a Java-based service platform – it’s used in Eclipse, for example. Somewhat bizarrely I think I missed out on learning about this previously because I must have glazed over when seeing the acronym, confusing it with OGSI (the Open Grid Services Infrastructure).

To cut a long story marginally shorter, I’ve now ditched my own agent framework code (I can no doubt recycle bits) in favour of OSGi, and am currently noodling with creating the appropriate bundles – as OSGi calls its components – for the agent stuff, using Apache Felix as the host framework. I’ve still a good way to go before I get to my proof of concept, but after only a couple of days learning/coding I’m already making much more rapid progress than I was with my own ad hoc stuff. With a bit of luck I’ll have testbed stuff together for OpenID & OAuth (and related setups like FOAF+SSL) within the next week or so. I’m obviously also going to be looking at hooks into the Talis Platform. I can’t remember offhand whether it was Ian, Leigh or Sam, but someone’s already put together a load of Java client code to wrap HTTP interactions with the Platform, so most of the work there’s already been done.

Oh yeah, and I reckon OSGi might well give me a neat approach to the Semantic Web in a Box.

[Work in progress is currently in my personal svn://hyperdata.org/svn/ but I'll move it into the n² svn once I've got something more functional].

Authoring RDF data with SPARQL

Yesterday Yves Raimond and I presented a tutorial at WOD-PD where we created some turtle data and used my online semantic converter tool to convert the data to RDF/XML and POST it to the platform store we set up for the tutorial (wod-pd-sandbox).

In fact though, every SPARQL endpoint that supports CONSTRUCT is already a turtle -> rdf/xml converter. You can write Turtle with no variables in the CONSTRUCT graph, leave the WHERE graph pattern empty, and you will get back RDF/XML.

eg:

PREFIX ex: <http://example.org/>
CONSTRUCT {
  ex:Jimmy ex:eat ex:World .
}
 WHERE {}

returns

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:ex="http://example.org/" >
  <rdf:Description rdf:about="http://example.org/Jimmy">
    <ex:eat rdf:resource="http://example.org/World"/>
  </rdf:Description>
</rdf:RDF>

You can also use CONSTRUCT to create new data inferred from existing data. For instance, I wanted to add some triples about the conference, and I knew that everyone in the store with a URI in the store’s own namespace had been following the tutorial, and so was also attending the conference. So I made this query, and then POSTed the results into the store:

           PREFIX schema: <http://api.talis.com/stores/wod-pd-sandbox/items/Schema/>
	PREFIX sandbox: <http://api.talis.com/stores/wod-pd-sandbox/items/Things/>
	PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
	PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
           PREFIX owl: <http://www.w3.org/2002/07/owl#>

	CONSTRUCT { 

		schema:Conference a rdfs:Class ;
		rdfs:isDefinedBy schema: ;
		rdfs:label "Conference" .

		schema:startDate a rdf:Property ;
			rdfs:isDefinedBy schema: ;
			rdfs:label "start date" .

		schema:endDate a rdf:Property ;
			rdfs:isDefinedBy schema: ;
			rdfs:label "end date" .

		schema:attendee a rdf:Property ;
			rdfs:isDefinedBy schema: ;
			rdfs:label "attendee" ; owl:inverseOf schema:attended .

		schema:attended a rdf:Property ;
			rdfs:isDefinedBy schema: ;
			rdfs:label "attended"; owl:inverseOf schema:attendee .

		sandbox:WOD-PD a schema:Conference ;
		           rdfs:label "Web of Data" ;
		           schema:startDate "2008-10-22" ;
		           schema:endDate "2008-10-23" ;
					   schema:attendee ?person .
		?person schema:attended sandbox:WOD-PD .
}  WHERE
{
	?person a <http://xmlns.com/foaf/0.1/Person> .

           FILTER(REGEX(STR(?person), "sandbox/items/People/"))
}

I used PREFIX to declare a prefix for a couple of namespaces with the store’s contentbox URIs – this meant that these URIs would dereference and work as Linked Data – 303ing to their RDF descriptions. This is a really nice feature of the platform, and makes it easy to mint new URIs that will play nice on the semantic web.

You might also have noticed that there are some new properties and classes defined there in the CONSTRUCT. This isn’t absolutely ideal – there is no documentation, and the terms are unlikely to be used again – but on the other hand, the descriptions are dereferencable according to the principles of linked data, and just as persistent as the data they describe. Moreover, as Richard Cyganiak said today – if you worry about doing RDF ‘right’ to the extent that it stops you doing RDF, you’re not doing it right.