Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

LIBRIS – Linked Library Data

| This post will feature in Nodalities Magazine, Issue 5
By Anders Söderbäck and Martin Malmsten

LIBRIS is the Swedish National Union Catalogue, or, in other words, the main gateway for bibliographic data in Sweden. LIBRIS consists of roughly six million bibliographic records, 20 million library holdings records, and two hundred thousand authority records on authors, titles and subject headings (“Svenska ämnesord”). The LIBRIS system is used for cataloguing by about 170 library organisations. The participating libraries come mainly from the academic sector (i.e. university libraries), but it is also possible to find museums, archives and some public libraries. Last but not least LIBRIS is also the home of the Swedish National bibliography. LIBRIS is created co-operatively, but is hosted and maintained by the National Library of Sweden.

Earlier this year, LIBRIS was published as Linked Data on the web, exposing the entire library state with all its records, links and relations. As far as we know, this is the first union catalogue or national library catalogue to be published in its entirety as linked data. Not counting lcsh.info (which is great, but contains no bibliographic information) it is the first effort by a national library to actually be part of the semantic web. We made this effort using a “data first” strategy, focusing on availability rather than a perfect representation of the database. The reason for this strategy was simple: We did not know the perfect structure for a bibliographic web of data. Neither did we want to spend our time just thinking about this. Libraries are, in our opinion, very good at thinking and rethinking bibliographic data. Actual reworking and restructuring of library data unfortunately seems less common. For us, the best way to get to know a technology—whether MARC records or the Semantic Web—is to get our hands dirty and work with it. Progress comes, we firmly believe, by learning from mistakes and learning to adapt to new environments, not by staying safely at home making up the perfect travel plan.

Learning also comes from talking and discussing with other people, which is why we wanted to provide something to talk about as quickly as possible. Trying to move beyond the library community, we wanted to use ontologies that were not library specific. For this reason we used Dublin Core, SKOS, FOAF and Bibliontology. Where no existing ontology seemed applicable (holdings, frbr relations), we made up our own. Using the identifiers from our database (MARC field 001, for all you library people out there), we created cool HTTP URIs for bibliographic and authority records. Following the four rules of Linked data specified by Sir Tim Berners-Lee, we tried to provide anyone who looked up those URIs with as much useful information and as many links to other URIs as possible. For the most part, this meant links to other resources within the LIBRIS dataset. Since we already had a good bespoke mapping of our Swedish subject headings to the Library of Congress subject headings, we generated approximately twelve thousand links to lcsh.info. In keeping with our desire to move beyond the library community, we also included a handful of links to wikipedia and dbpedia. In a not too distant future, we plan to automatically extend the number of these links to about thirty thousand.

All of the above is described in more detail in Making a Library Catalogue Part of the Semantic Web (Malmsten 2008). What is not described in this paper, however, is why we decided to depart into semantic web territory. One could of course attribute this departure to inborn curiosity. As librarians and developers we naturally want to explore new landscapes in the ecology of knowledge. But the reason why we started looking at linked data might just as well be more prosaic. In 2007 we made a major revision of our web OPAC. We did this using an open and collaborative methodology, focusing on user centred design as well as communication with just about anyone having an interest in what we were doing. During this process, we discovered that there was a huge interest in getting access to the data contained in the LIBRIS database in a machine readable way. This interest was not limited to the library community, but was expressed by a lot of people working outside the library space with no experience of library specific protocols such as z39.50 and MARC.

In 2008, having our data available in a machine readable-way feels just as natural to us as making our data available in a human readable interface. In building a library catalogue: our purpose and responsibility is to provide access to library resources. Limiting this access to paths that can only be walked by humans seem to be an unnecessary restriction, which certainly is not in the best interest of our users. The library catalogue might also be considered a resource in itself, and there is no reason why we should not make this resource available for the public just as we do with our books and other resources. Of course, this can be achieved without creating linked data on the semantic web. Why not, for example, use z39.50, SRU/W or OAI-PMH? The answer to this question is that we make our data available in those ways as well. We have also put up a HTTP based web service which can output LIBRIS data as MARC-XML, Dublin Core, Json, RIS or MODS. If we for some strange reason ran out of other things to do, we could probably spend the rest of our lives just building APIs. This situation called for a more rational approach.

A trouble with the above mentioned methods of data access is that they only provide information that is present in the original MARC records. No information is given about other linked resources. When looking up a certain resource in our OPAC, a human viewer is provided with a lot of information that is not present to the machine accessing individual records through a search/retrieve protocol. From this perspective, working with rdf and linked data is a lot more rational, since it makes it possible for us to expose the entire state of our library system in a web friendly way. Ed Summers, who is the person at the Library of Congress responsible for creating lcsh.info, describes this very eloquently in a blog post from the beginning of 2008. API:s, Ed writes, “…all differ in their implementation details and require you to digest their API documentation before you can do anything useful. Contrast this with the Web of Data which uses the ubiquitous technologies of URIs and HTTP plus the secret sauce of the RDF triple.” As a alternative to making open API:s for just about every aspect of our data, we hope that allowing our data to be crawled by human and machine alike will allow for new ways of discovery, as well as for people using our data in ways not even imagined by us. The Swedish Government has, as one of the objectives given to us as a National Library, stated that the LIBRIS systems shall be used as a broad information resource for the improvement of information management within research and higher education. Putting as few restrictions as possible on how our data can be used is, we feel, probably the best way to achieve this objective.

When it comes to the improvement of information management, the linking between different datasets made possible by linked data shows a lot of promise for cooperation and interoperability between libraries as well as for bridging the gap between libraries and other knowledge organizations. This is something we have only begun to explore. The aforementioned links to lsch.info are useful for making inferences about relations not present in our system of subject headings from relations present in LCSH. The value of these kinds of inferences will increase exponentially the bigger the cloud of linked library data grows. While we at this time only have a few links to dbpedia, we hope that the upcoming addition of a substantial number of dbpedia links will provide interesting possibilities for us as well as for others. Libraries have, over the last few hundred years years, collected huge amounts of what is usually very good data. Locked up in individual databases, this is good for retrieval of resources belonging to individual libraries. This, it needs to be stated, is not a bad thing! Publishing library databases in rdf but without links to other datasets might be good for individual libraries if only for the opportunity to use SPARQL, which is a query language we have fallen in love with and which we imagine might fulfill any librarians desire for good, exhaustive database querying.

Linked library data provides amazing opportunities for cooperation about, for example, authority data. This way of making use of each others intellectual efforts seem to us an effective way of improving the quality of the individual catalogue, while at the same time improving the quality of the web at large. Authority databases are amazing resources, and probably useful for other purposes than just improving search in the individual OPAC. Exposing library data might also be a good way to get feedback on our data from outside the library community. Visibility and open communication provides good ways of quality improvement, which has been proven again and again in science as well as in society. Libraries might be good data providers and a cloud of linked open library data might give rise to interesting perspectives, exciting new applications and better competition between developers, be they commercial system vendors, ad-funded search engines, or in house library development teams. Such competition probably ends up benefiting our users, because in the end libraries are not about catalogues, bibliographic formats, databases or OPACS. Libraries are about openness, about dissemination and access to research and cultural heritage, about science, memory and democracy. For this reason, libraries need to stop worrying and embrace web standards. A web of data without library participation is a bad thing, not only for libraries but also for the web.