voiD, datasets, graphs, documents, and dcterms:isPartOf backlinks
One thing that I have heard people asking several times now regarding voiD is to do with how to say that data is part of a dataset.
Frédérick Giasson asked about this recently in #swig, and wondered why the voiD guide recommended using dcterms:isPartOf. I thought, since this is something that has been asked about a few times, I would blog about it and explain the reasoning behind this.
So, it wouldn’t be right to say something like:
<http://lastfm.rdfize.com/artists/Black+Sabbath> dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
… because we don’t want to say that “Black Sabbath is part of the lastfm.rdfize.com dataset”.
We want to say “a description of Black Sabbath (composed of triples) is part of the lastfm.rdfize.com dataset“.
One approach to encapsulating this meaning would be to reify each individual triple and state that the triple is part of the dataset … but we felt that this would be neither practical nor popular.
So, in the voiD guide, we advocate that when you publish Linked Data, and you want to say that the data you are publishing is part of a voiD Dataset, you add a triple linking the document in which the data is published, to the dataset. eg:
<http://lastfm.rdfize.com/?artistName=Black+Sabbath> terms:partOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
(where <http://lastfm.rdfize.com/?artistName=Black+Sabbath> is a document containing a description of <http://lastfm.rdfize.com/artists/Black+Sabbath>)
This way, when a Linked Data client dereferences <http://lastfm.rdfize.com/artists/Black+Sabbath> they get redirected to a document, and can follow the dcterms:isPartOf link from the document URI to the voiD Dataset.
What some people don’t like so much, is the implication that their dataset consists of documents, when what they really want to say is that their dataset consists of descriptions of resources.
The conceptual problem, if there is one, is that here the document URI is identifying an RDF/XML document, not the graph of RDF data encoded in that document. So, if you wanted to explicitly state that the graph, rather than the document, is part of the dataset, it could perhaps be done like this:
[ a <http://www.w3.org/2004/03/trix/rdfg-1/Graph> ;
<http://purl.org/vocab/frbr/core#embodiment> <http://lastfm.rdfize.com/?artistName=Black+Sabbath&output=rdf> ;
dcterms:isPartOf <http://lastfm.rdfize.com/meta.n3#Dataset> .
]
But I’m really not too sure if that is either semantically correct, or in any way a more practically useful description than simply saying the document is part of the dataset.
We (the voiD guide authors) think that the <document> dcterms:isPartOf <dataset> pattern is the most pragmatic approach to making a dataset discoverable from a LOD document.
But we are also open to suggestions for improvement as we evolve the vocabulary and guide in line with popular usage and the requirements of LOD publishers.
What do you think?







