Subscribe

Archive for the 'Community News' Category

voiD, datasets, graphs, documents, and dcterms:isPartOf backlinks

One thing that I have heard people asking several times now regarding voiD is to do with how to say that data is part of a dataset.

Frédérick Giasson asked about this recently in #swig, and wondered why the voiD guide recommended using dcterms:isPartOf. I thought, since this is something that has been asked about a few times, I would blog about it and explain the reasoning behind this.

So, it wouldn’t be right to say something like:

<http://lastfm.rdfize.com/artists/Black+Sabbath> dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .

… because we don’t want to say that “Black Sabbath is part of the lastfm.rdfize.com dataset”.
We want to say “a description of Black Sabbath (composed of triples) is part of the lastfm.rdfize.com dataset“.

One approach to encapsulating this meaning would be to reify each individual triple and state that the triple is part of the dataset … but we felt that this would be neither practical nor popular.

So, in the voiD guide, we advocate that when you publish Linked Data, and you want to say that the data you are publishing is part of a voiD Dataset, you add a triple linking the document in which the data is published, to the dataset. eg:

<http://lastfm.rdfize.com/?artistName=Black+Sabbath> terms:partOf <http://lastfm.rdfize.com/meta.n3#Dataset> .

(where <http://lastfm.rdfize.com/?artistName=Black+Sabbath> is a document containing a description of <http://lastfm.rdfize.com/artists/Black+Sabbath>)

This way, when a Linked Data client dereferences <http://lastfm.rdfize.com/artists/Black+Sabbath> they get redirected to a document, and can follow the dcterms:isPartOf link from the document URI to the voiD Dataset.

What some people don’t like so much, is the implication that their dataset consists of documents, when what they really want to say is that their dataset consists of descriptions of resources.

The conceptual problem, if there is one, is that here the document URI is identifying an RDF/XML document, not the graph of RDF data encoded in that document. So, if you wanted to explicitly state that the graph, rather than the document, is part of the dataset, it could perhaps be done like this:

[ a <http://www.w3.org/2004/03/trix/rdfg-1/Graph> ;
<http://purl.org/vocab/frbr/core#embodiment> <http://lastfm.rdfize.com/?artistName=Black+Sabbath&output=rdf> ;
dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
]

But I’m really not too sure if that is either semantically correct, or in any way a more practically useful description than simply saying the document is part of the dataset.

We (the voiD guide authors) think that the <document> dcterms:isPartOf <dataset> pattern is the most pragmatic approach to making a dataset discoverable from a LOD document.
But we are also open to suggestions for improvement as we evolve the vocabulary and guide in line with popular usage and the requirements of LOD publishers.

What do you think?

voiD: a Vocabulary of Interlinked Datasets

As technological advances allow the production and dissemination of information to scale out, old methods for navigating the information become inadequate, and we need new means to cope with the greater scale of information available.

With the rise of printing in the 16th century, library collections flourished, making more ideas and information available to more scholars than ever before. Yet to know what books a library contained, scholars had to either physically visit the library (and browse the shelves, or consult a manuscript catalogue), or make enquiries by letter.

Frontpiece of the first printed library catalogue

In 1595, Leiden University innovated by becoming the first institution to make their library’s catalogue available in print. Just as printing had made the editions within a library far more widely available, printing a book about the library’s collection, brought awareness of the library and its contents to a greater audience. Now, scholars all across Europe could tell if Leiden University’s library had the information they needed. Scholars had more information about what books were available, and Leiden’s international reputation was bolstered. Other libraries followed suit by printing their own catalogues, and those library catalogues could be collected. Scholars could compare the strengths and purposes of multiple libraries from a single location.

When the Linked Open Data movement began gaining ground in 2007, there were relatively few large RDF datasets available on the web. If you followed the right blogs and mailing lists, you knew which datasets were available. As the LOD Cloud grows (and manually drawing it becomes less and less practical), it becomes apparent that the number of datasets is outgrowing our methods for discovering them. Just as it made sense for libraries in the 16th century to use the technology of print to publish descriptions of their collections, it is natural to use RDF to publish descriptions of datasets available on the web. Just as printed catalogues brought library collections to new audiences, and enabled new uses, RDF descriptions will bring datasets to new audiences (machines!), making them more findable, and enabling new uses. All you need is the vocabulary to describe datasets with.

voiD interlinking dataset diagram

voiD is a vocabulary dataset publishers can use to describe their datasets: their subject areas, their access mechanisms (eg: APIs, SPARQL endpoints, data dumps), their licensing, their provenance, how they link to other datasets, which vocabularies are used within them, and statistics relating to their contents.

As well as the vocabulary, there is the voiD guide, where the authors of voiD (Jun Zhao, Michael Hausenblas, Richard Cyganiak, and myself [Keith Alexander] ) explain how to create voiD descriptions combining terms from voiD with other useful vocabularies, publish voiD, and query voiD.

Feedback on both the vocabulary, and the Guide, will be gratefully received at void-rdfs-internals@googlegroups.com.

Getting Started With the Talis Platform Presentation

We ran a training workshop at the Talis offices last week with a small group of developers looking at using the Talis Platform for a community information project with which we’re involved. I thought it would be useful to share the slides from the session I ran.

The session was intended to provide a walk through of the main concepts, technologies and features of the Platform. The goal being to fill a gap between previous “What is the Talis Platform” presentations we’ve given in the past, and the detailed API documentation.

The slides can be found up on slideshare.


By rob

There is a consistent set of examples used throughout the presentation. These draw on some data I’ve been compiling about spaceflights. You can find the Platform store here, including the SPARQL endpoint (for testing the example queries), or look at some of the below URLs:

The data and schema is very much a work in progress and is likely to change. However there’s sufficient data there if you want to follow on with the presentation and explore some of the Platform features.

I plan to keep the presentation up to date with the data as it evolves and also hope to use the Slideshare “slidecast” features to add a voiceover to add in the missing context.

paggr wins at ISWC

Benjamin Nowack, Semantic Web developer and innovator par excellence, and author of the ARC RDF library for PHP (which we have mentioned on this blog more than once), deservedly won the ISWC2008 Semantic Web Challenge for his application: paggr .

Paggr uses Benji’s scripting language extension to SPARQL “SPARQL SCRIPT“, to define widgets which can pull in semantic data from sources across the web, mesh it up, and render it on the page.

Well done Benji! We’re all looking forward to the public beta :)

paggr wins the semantic web challenge 2008

Drupal and the opportunity of RDF

At the start of this week, Dries Buytaert presented the keynote presentation at DrupalCon 2008 . The most exciting revelation came at the end: Drupal’s future is in the semantic web..

While Dries talks about the semantic web, and RDF, you don’t hear much reaction from the crowd; but then he says Let me show you a video of the future And proceeds to demonstrate SPARQLing on linked data from sources like dbpedia dbtunes, geodata, events, friends lists, and google spreadsheets, mashed-up in Exhibit.

This gets a lot of applause :)

In the keynote, he puts emphasis on data interoperability, decentralisation, remote querying, and how having a lot of data is great fun :)

It’s a really great talk, with a lot of excellent quotes about the value of RDF for Drupal, here are some of my favourites:

Web 3.0 (much as I hate to use the term) is all about infinite interoperability

We have the opportunity to be mentioned in the history books of the web … This is where the web is going. And this right time, and the right place, to make it happen.

Using RDF you can connect all these different parts of data, that live in different parts of the web.

RDF turns the web into a database

The real opportunity we have here is to start sprinkling this map [of linked open data sources] with Drupal. Every single Drupal site can be an RDF repository that people can query

Google are trying to build a world social graph, connecting people … but what we are doing with RDF is connecting not just people, but everything

With RDF, the import/export problem we have in Drupal just goes away. It just works, without having to describe database schemas… It just works. It’s a problem that is already solved.

You can listen to the audio of the presentation at archive.org (~45MB - the RDF stuff starts at around 53 minutes), and view a video of the RDF demonstration

You can also read more about Drupal and RDF here