voiD: a Vocabulary of Interlinked Datasets
As technological advances allow the production and dissemination of information to scale out, old methods for navigating the information become inadequate, and we need new means to cope with the greater scale of information available.
With the rise of printing in the 16th century, library collections flourished, making more ideas and information available to more scholars than ever before. Yet to know what books a library contained, scholars had to either physically visit the library (and browse the shelves, or consult a manuscript catalogue), or make enquiries by letter.
In 1595, Leiden University innovated by becoming the first institution to make their library’s catalogue available in print. Just as printing had made the editions within a library far more widely available, printing a book about the library’s collection, brought awareness of the library and its contents to a greater audience. Now, scholars all across Europe could tell if Leiden University’s library had the information they needed. Scholars had more information about what books were available, and Leiden’s international reputation was bolstered. Other libraries followed suit by printing their own catalogues, and those library catalogues could be collected. Scholars could compare the strengths and purposes of multiple libraries from a single location.
When the Linked Open Data movement began gaining ground in 2007, there were relatively few large RDF datasets available on the web. If you followed the right blogs and mailing lists, you knew which datasets were available. As the LOD Cloud grows (and manually drawing it becomes less and less practical), it becomes apparent that the number of datasets is outgrowing our methods for discovering them. Just as it made sense for libraries in the 16th century to use the technology of print to publish descriptions of their collections, it is natural to use RDF to publish descriptions of datasets available on the web. Just as printed catalogues brought library collections to new audiences, and enabled new uses, RDF descriptions will bring datasets to new audiences (machines!), making them more findable, and enabling new uses. All you need is the vocabulary to describe datasets with.

voiD is a vocabulary dataset publishers can use to describe their datasets: their subject areas, their access mechanisms (eg: APIs, SPARQL endpoints, data dumps), their licensing, their provenance, how they link to other datasets, which vocabularies are used within them, and statistics relating to their contents.
As well as the vocabulary, there is the voiD guide, where the authors of voiD (Jun Zhao, Michael Hausenblas, Richard Cyganiak, and myself [Keith Alexander] ) explain how to create voiD descriptions combining terms from voiD with other useful vocabularies, publish voiD, and query voiD.
Feedback on both the vocabulary, and the Guide, will be gratefully received at void-rdfs-internals@googlegroups.com.

