If Sir Tim Berners-Lee can equate Linked Data with a packet of crisps/potato chips, I thought I would take a stab at another food metaphor for this post.
Linked Open Data (LOD) is a concept that many believe they understand. Take yourself to most any conference that has a connection with data, or the web, or the Internet at the moment, and it will not belong before you see a slide of the Linked Open Data cloud diagram, or of Sir Tim imploring us to give him our raw data now, or if you are very lucky a
shot of him doing his imploring whilst stood in front of a shot of the LOD cloud. - Simple really, just publish your data as Linked Open Data and all will be wonderful as we move towards the sunlit Semantic Web uplands. Unfortunately life is never that simple – LOD is not a single identifiable thing. As Paul Walk eloquently puts it:
- data can be open, while not being linked
- data can be linked, while not being open
- data which is both open and linked is increasingly viable
- the Semantic Web can only function with data which is both open and linked
As with any recipe for success, the majority concentrate on the final result. Praising or criticising it as a whole, without identifying the benefits or otherwise, of the individual ingredients.
Take a strawberry pavlova for instance. If you you are in to that kind of thing, a delightful culmination of the culinary arts designed to send your taste buds in to raptures. Unless that is, you don’t like cream, or you don’t like strawberries, or can’t abide meringue, in which case the whole thing seems a little pointless.
What has this got to do with Linked Open Data (LOD), I hear you ask. Well, I am increasingly seeing LOD being presented as the goal for those wishing to publish their data on line. My position is that the eventual goal, from which will spring a Semantic Web, is a global web of linked and open data. However, there are many steps from where we are now to achieving that goal. Within audiences that I present to, and/or sit amongst, I see people who for whatever reasons do not ‘get’ one or more of the components of LOD – they cannot envisage opening up any of their data, or think that using a web address for an identifier is over complex, or have a religious aversion to RDF. As a result they dismiss the whole recipe as not for them, or worse still, as something impractical that will become nothing more than the plaything of a few passionate enthusiasts.
When someone who is still struggling with the concept of opening up their organisation’s data; or why RDF might be a more useful format than csv, is shown the ubiquitous Linked Open Data cloud diagram with encouragement to join in – it is hardly surprising they remain a little unconvinced. This isn’t a criticism of presenters either. In only 20 minutes on a stage, it is difficult to go into underlying detail.
Let my try in a few paragraphs to break the LOD pavlova in to it’s ingredients
- Data – In the context of this post, by data I mean machine readable information, produced in a format that can be consumed and processed by other machines. Inevitably, this means file formats such as csv, XML, RDF, etc. , but not something like pdf, html, or word, which although they are in a transferrable format it is designed for human consumption not machine analysis.
For some, just this step from their current human targeted format, to a machine readable one, is a significant one.
- Open Data – Data (see above) which is accessible for all to download, view, and consume in a way that is not encumbered by licensing that restricts its use. For example, the licensing used by data.gov.uk data. By definition data which is restricted for certain uses is not fully open.
In our internet based world, openness can also be defined in terms of technical accessibility. If it is only available after a login process, or it is only available to users behind a firewall, it couldn’t be considered as open.
- Linked Data – Data (see above) which contains URIs as identifiers for concepts described in the data and URIs to identify the relationships between those concepts. The four Linked Data Principles, as published as a design note by Tim Berners-Lee, provide a bit more detail on this.
I am in danger of stirring the embers of a religious fire fight here, between those that believe that Linked Data must be described in RDF and contain URIs as identifiers, and those that maintain that you can have data linked across the web without those constraints. All I am going to say on that at this time, is that the Linked Open Data cloud of data sets has been successful, based on the first of those two views. (if you want to follow that particular debate in more detail, Paul Miller’s post and associated comments would be a good starting point)
So, how can data be open, but not linked? – by publishing in in a non-Linked Data form such as a text file or a html page or a pdf. Where would you find this? – all over the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to consume.
How can data be Linked but not open? – by publishing it in accordance with the principles, in RDF, with URIs, but restricting access either by imposing restrictive licensing conditions or restricting access to the data. Where would you find this? – again all over the web, but often hiding behind restrictive licensing terms such as “non-commercial use only”. Also to be found inside organisational firewalls. For example, commercial organisations can realise the benefits of using Linked Data techniques with their internal private data. Potentially linking it to publicly visible concepts across the web to add even more value for their employees.
Data that is Linked and Open, like that strawberry pavlova, has the power to deliver value beyond the sum of its individual ingredients. By providing data in a form that is linked to other data, and easy for others to link to, without restrictions on who or how that linking takes place, provides the foundation for a web of linked data built on the same principles that fostered the growth of the web of documents that has so changed our world over the last decade and a half.
The ingredients that formed that World Wide Web of documents – html, http, open publishing of web sites without restrictions on other’s abilities to consume and/or link to them – individually were important developments. However, when those elements were blended together their effects were multiplied many fold and resulted in the web we experience today.
So [as I stretch my culinary metaphor to it’s limits] if you are hoping to take people with you in building a Linked Open Data future, you not only have to show them a picture of the final dish, you need to describe the individual ingredients and their relevance to the eventual result.
Pictures from Flickr by PhOtOnQuAnTiQuE and avixyz