Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

One Step at a Time

rjw_caricature_mini I expected some comments to my Data Publishing Three-Step post last week but what I didn’t expect was a virtual pat on the head with an accompanying croon of "Who’s the clever boy, then? You are! Yes, you are!" in a reply post—I’d love to dance with you, but…— from Dorothea Salo.

The problem she identifies in her, politely phrased, complaint about my reductionist approach is this:

Aside from my friends the open scientists (and not even all of them, to be honest), practically all the data-producing researchers I know are firmly stuck on Step 1. Firmly stuck, not to say "immovably." As for Step 2… trust me, these folks are not data modellers. I sincerely doubt my own capacity to teach RDF to someone who approaches me asking, "Is it okay if I record my data in Excel?"

And I totally agree with her.   It would be great in an ideal world if data creators could take all three steps to publish in a linkable queryable form.  But as she identifies, many folk are not data modellers, and wouldn’t want to be.   The three steps I identified are there for motivated people to take as many, or as few, that are compatible with their work and motivations.   All anyone could ask is that they at least have an awareness that others may have sufficient interest and motivation to take their data through the next step.

Starting with getting your data out there, in any form (yes even Excel, if that is your tool of choice), is the foundation.   Without the data in a form that you, and others, could reuse, there is little point.

So expanding on my Step 1. description:

  • Publish your data.
  • Publish it in a way that others can use – in a known format from which you can easily extract the actual data elements (Excel, csv, etc.  not pdf, or word).
  • Publish it with an explanation of what the data is, and where to get it.
  • Publish it under simple unambiguous licensing terms, without ambiguous restrictions such as ‘non-commercial only’.
  • If possible identify things in the data using well known identifiers – not ‘substance_1234′ where H₂SO₄ could be used, or location_abc where Paris, FR would do.
  • If there is not a suitable well known identifier set, create your own but publish that as well.
  • Be consistent, with yourself and others around you – don’t go reinventing wheels.
  • Publish your data!

Whilst taking Dorothea’s point about the difficulty in just convincing some people of the merits in exposing their data, none of this is rocket science and doesn’t mention Linked Data, or offends her longtime RDF scepticism.

Get significant amounts of data out there, and hopefully others will be motivated enough to use it usefully to add value by linking it to other data – maybe that will help demonstrate the worth of steps 2 and 3.

Picture published on Flickr by paraflyer.

6 Responses

  1. Bill Roberts Says:

    I understand that people are easily put off by talk of data modelling, but those who might be happy with Step 1 but don’t fancy Step 2 are in fact already doing data modelling in order to create their table or spreadsheet – it’s just that they have done it intuitively rather than explicitly. For one person to communicate their data to another they must in effect share a common data model. If they come from similar backgrounds, then what that data model is may be obvious, but lots of errors arise when people make different assumptions about the details.

    And as Richard explains, the challenge to more effective data sharing on the web is to start making data models explicit so that we can reduce the amount of effort in combining data from different places.

  2. Scott Banwart's Blog » Blog Archive » Distributed Weekly 58 Says:

    [...] One Step at a Time [...]

  3. L’opendata dans tous ses états – Juillet II « Says:

    [...] One Step at a Time [...]

  4. Verpa Says:

    This post and its parent were great, but as someone with both a science ( chem ) and computery background, I think the best thing to do would be to give us an example of the three-step process.

  5. Education and the Real World… « OUseful.Info, the blog… Says:

    [...] As an afterthought, I added a pointer to a recent blog post from Richard Wallis: One Step at a Time. [...]

  6. Jodi Schneider Says:

    Going from Excel to RDF isn’t unthinkable!

    http://www.mnot.net/blog/2005/08/13/excel_microformats
    http://blogs.sun.com/bblfish/entry/excell_and_rdf