Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

The Data Publishing Three-Step

rjw_caricature_mini dance footprints In a conversation with data owners about how they should be publishing their data, it is usually not long before the following question turns up: “So, what do I actually have to do to publish my data?”  Often the conversation then wanders off into a game of buzzword bingo–RDF, RDFa, SPARQL, dereferenceable URIs, triples, content negotiation, open data, Linked Data, end-points, etc.—to be followed by a blank look and the unuttered question "Yes, but what do I actually have to do to publish my data?

In an attempt to simplify the answer to that oft unuttered question, I break things down in to three steps.

Step 1   Get your Data Out – for others to consume
Sounds simple.  Just take the spreadsheet (or similar file) that you use to track information, post it on your web site and link to it from a description posted in an accompanying web page. It can be that simple, but there are things to consider:

  • Licensing – will potential consumers of the data be confident on their ability to use and/or reuse it. (The UK Government are very clear on this)
  • Is it open but opaque? – The terms, codes, identifiers etc. you use may be meaningless, or worse still ambiguous, to those outside your organisation, or even your department.
  • Could your data be made more consistent with other data you, or similar organisations, already publish.

All things to be considered, but not to be put up as excuses for not publishing.

Step 2   Get your Data In – to an open linkable standard format
This is the most powerful step, which consists of identifying the elements in your data (organisations, locations, things, projects, types, etc.) and giving them unique identifiers then make these identifiers web links.  Fortunately this may not be as onerous as it sounds. There are many publicly visible/usable identifiers that you can use for your data – for example:

For this step to be effective you really need to be modelling your data.  Your [first class] data elements, and the relationships between them.  Plus possibly relationships with external entities.  The output of this step will be an RDF representation of your data to Linked Data Principles. You should also identify the process or rules to get from your source data in to this new form, enabling you to repeat for later versions of your data.

Having said all that, it is not necessarily only you that will/can do step 2.   It is perfectly possible for a third party, or a central organisation such as data.gov.uk, or even an enthusiast, to carry out this data modelling and transformation step with data that you have openly published.

Next you need to publish your data so that it can become part of the Web of Linked Data, which brings me, with apologies to fans of the traditional party song, to…..

Step 3   Link it all about
Going through step 2 and not making your data available, or providing useful information at the end of the links you embed in your data, would be a bit of a pointless exercise.  How to publish this data is the next question, to which there are at least three equally valid answers.

  • Using an encoding technique called RDFa, you can embed the RDF data within the html coding of a web page so that software can obtain a more structured representation from a web page than a human, viewing it in a browser would.
  • You could just publish the RDF in rdf files on your web server.  A good example of this is the way the BBC publish the RDF for many of their pages, such as for their Wild Life. The Lion Web page – the RDF for Lion (dependant on your browser, you may need to use it’s view page source option to see the actual RDF encoded in XML)
  • You could store the individual RDF statement (triples) in a triple store, or SPARQL end-point.  This not only publishes the RDF, but also enables the data and relationships within the data to be queried. This is how data.gov.uk publishes RDF, from Talis Platform Stores.   This interface might look a bit cryptic – the results, formatted in XML in the top box, from running the SPARQL query shown in the bottom box – but this is a developers interface demonstrating the code and results an application might use, so you wouldn’t expect much different.

I’ve decided to go through these steps, can you remind me again why?  -  So that your data can be linked with other data to add value to the experience of consumers of your data and services, as well as others using your data to add value elsewhere.  A good example of this in action being the BIS Research Funding Explorer.

16 Responses

  1. Tweets that mention Nodalities » Blog Archive » The Data Publishing Three-Step -- Topsy.com Says:

    [...] This post was mentioned on Twitter by rjw, infopeep and others. infopeep said: Nodalities Talis: The Data Publishing Three-Step http://bit.ly/amnu0k [...]

  2. Lee Feigenbaum Says:

    Richard, I hate to point it out, but your simplified answers to the questions surrounding Linked Data “wander[ed] off into a game of buzzword bingo–RDF, RDFa, SPARQL” somewhere around Step 3. :-)

  3. Richard Wallis Says:

    Fair comment Lee – at least the bingo did not start until step 3!

    What I would say though, is by taking these separate logical steps towards publishing as Linked Data the inevitable buzz-words will become clear as the process roles through Step 2.

    I note that @gavinwray tweeted that he is “still terrified by step 2.” Hopefully that won’t stop him taking step 1, and getting his data out there in the first place. Then either someone else may find it useful and take the 2nd step for him. There again, he could always try one of the free Talis Open Days that would give him a non-terifying introduction the Linked Data world or look for some more in-depth (non-scary) training.

  4. Lee Feigenbaum Says:

    In all seriousness, I do agree & appreciate the work you and everyone else at Talis are doing to encourage open data. I’d like to think that tools like the ones we’re building at Cambridge Semantics should help ease the terror of Step 2 by making it much easier to get the data into Linked Data friendly formats… the tricky step (and one whose value is not always apparent) is how much serious modeling do you need to do to make the open data useful?

  5. François Scharffe Says:

    An interesting small post. We, a consortium of universities, companies and french institutions, are launching the project Datalift on this exact topic: Data publishing, providing tools for facilitating the publication process. We call this process data elevation: the way to get to data paradise :)
    http://datalift.org

  6. Avoimen datan julkaisemisesta « Sorvipenkki Says:

    [...] Petri Avoimen dataan liittyviä kysymyksiä käsitellään kiinnostavasti sekä Richard Wallisin (The Data Publishing Three-Step) että Gavin Starksin blogikirjoituksessa (Data is not binary : Why open data requires credibility [...]

  7. L’opendata dans tous ses états – Juillet I « Says:

    [...] The Data Publishing Three-Step (EN) [...]

  8. Creating linked data « David's MLIS blog Says:

    [...] also the reference to the Nodalities Blog: Leave a [...]

  9. Mr. Gunn Says:

    Speaking from a biological scientist, not data scientist, perspective, you’ve lost 90% of people by step two. What’s modeling data? What’s a “first class” data element?

    As things stand now, there’s a class of people who’d be happy to put data up on the web, but it’s an almost entirely separate class of people who’d come up with the data model, URIs, and RDF.

    The group of people who could be expected to do all three “simple” steps above (that is, generate useful scientific data AND model it) is almost vanishingly small.

  10. Richard Wallis Says:

    Not an unsurprising comment, it is often a separate group of people that take on the individual steps. Not everyone is a data modeller, and they shouldn’t have to be. Get your data our there [preferably with an eye on the possibility that just publishing it might not be the end-game] so that others can use and build upon it.

    So if just Step 1. is something you can add to your workflow, fine. There are increasing number of people out there more than ready to identify useful and interesting data and move it on to be used in often unexpected ways.

  11. Nodalities » Blog Archive » One Step at a Time Says:

    [...] expected some comments to my Data Publishing Three-Step post last week but what I didn’t expect was a virtual pat on [...]

  12. Creating Linked Data | call for papers Says:

    [...] No&#100alitie&#115 blog has the post The &#68ata Pub&#108ishin&#103 Three-Step. &#65gree &#99ompletel&#121. Our &#99ataloging alread&#121 meet&#115 &#99ommon &#115tandard&#115, [...]

  13. Egon Willighagen Says:

    @Mr Gunn: I’d would not so much worry about the language in Step 2, 3; the problem we face in Science, is to get the researchers level with Step 1. Some do, even in a non-semantic way:

    http://usefulchem.blogspot.com/2010/07/methanol-solubility-prediction-model-4.html#links

    But with communication, we can get these things semantic, as several have done for this (CC0-licensed) solubility data set:

    http://chem-bla-ics.blogspot.com/2009/11/linking-two-virtuoso-instances-to-one.html (sorry, no Talis, but Virtuoso ;)
    http://friendfeed.com/egonw/176bfcf5/critical-mass-for-open-notebook-science-wikis

  14. Start sharing public sector data online « Observations Says:

    [...] is adapted from Chris Taggart‘s presentation on opening up local government data and the data publishing three-step by Richard [...]

  15. Nodalities » Blog Archive » The Linked Open Data and Pavlova Says:

    [...] the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to [...]

  16. infomisa.net» Blog Archive » Linked Open Data and Pavlova Says:

    [...] the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to [...]