voiD, datasets, graphs, documents, and dcterms:isPartOf backlinks
One thing that I have heard people asking several times now regarding voiD is to do with how to say that data is part of a dataset.
Frédérick Giasson asked about this recently in #swig, and wondered why the voiD guide recommended using dcterms:isPartOf. I thought, since this is something that has been asked about a few times, I would blog about it and explain the reasoning behind this.
So, it wouldn’t be right to say something like:
<http://lastfm.rdfize.com/artists/Black+Sabbath> dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
… because we don’t want to say that “Black Sabbath is part of the lastfm.rdfize.com dataset”.
We want to say “a description of Black Sabbath (composed of triples) is part of the lastfm.rdfize.com dataset“.
One approach to encapsulating this meaning would be to reify each individual triple and state that the triple is part of the dataset … but we felt that this would be neither practical nor popular.
So, in the voiD guide, we advocate that when you publish Linked Data, and you want to say that the data you are publishing is part of a voiD Dataset, you add a triple linking the document in which the data is published, to the dataset. eg:
<http://lastfm.rdfize.com/?artistName=Black+Sabbath> terms:partOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
(where <http://lastfm.rdfize.com/?artistName=Black+Sabbath> is a document containing a description of <http://lastfm.rdfize.com/artists/Black+Sabbath>)
This way, when a Linked Data client dereferences <http://lastfm.rdfize.com/artists/Black+Sabbath> they get redirected to a document, and can follow the dcterms:isPartOf link from the document URI to the voiD Dataset.
What some people don’t like so much, is the implication that their dataset consists of documents, when what they really want to say is that their dataset consists of descriptions of resources.
The conceptual problem, if there is one, is that here the document URI is identifying an RDF/XML document, not the graph of RDF data encoded in that document. So, if you wanted to explicitly state that the graph, rather than the document, is part of the dataset, it could perhaps be done like this:
[ a <http://www.w3.org/2004/03/trix/rdfg-1/Graph> ;
<http://purl.org/vocab/frbr/core#embodiment> <http://lastfm.rdfize.com/?artistName=Black+Sabbath&output=rdf> ;
dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
]
But I’m really not too sure if that is either semantically correct, or in any way a more practically useful description than simply saying the document is part of the dataset.
We (the voiD guide authors) think that the <document> dcterms:isPartOf <dataset> pattern is the most pragmatic approach to making a dataset discoverable from a LOD document.
But we are also open to suggestions for improvement as we evolve the vocabulary and guide in line with popular usage and the requirements of LOD publishers.
What do you think?


May 28th, 2009 at 9:09 pm
@prefix log: <http://www.w3.org/2000/10/swap/log#>.
For the record, there is log:semantics which cwm understands as a builtin function
and is exactly the relation between the document and the graph.
So you might say
[ is log:semantics of <http://lastfm.rdfize.com/?artistName=Black+Sabbath&output=rdf> ]
dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset>.
or as a path
<http://lastfm.rdfize.com/?artistName=Black+Sabbath&output=rdf>!log:semantics
dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset>.
timbl
May 28th, 2009 at 9:27 pm
I’m not very convinced of the usage of FRBR here (partly maybe because I’m less and less convinced FRBR is a suitable model for designing ontologies but that’s another topic).
So in the log vocabulary (used by Cwm) there is . Look at the description, I think this might be a good property for linking from an information resource to a graph.
Regards,
Simon
May 29th, 2009 at 8:53 am
Hi Simon, Tim,
Yes, I wasn’t all that sure about using that frbr property; it was the nearest I could find at the time. log:semantics would be far better. I was also thinking perhaps the proposed awww:representation property in http://esw.w3.org/topic/AwwswVocabulary might be suitable, but log:semantics is better.
I would be interested in hearing if LOD publishers prefer to use this slightly more complicated (because it introduces another node and another level of indirection), but more precise modeling.
May 29th, 2009 at 11:17 am
Hmpf, the URI got filtered out. This was supposed to be “So in the log vocabulary (used by Cwm) there is http://www.w3.org/2000/10/swap/log#semantics“.