Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Archive for the 'Open Data' Category

Tom Steinberg talks about the Public Sector Transparency Board

rjw_caricature_mini Tom Steinberg Tom Steinberg of mySociety fame joins me on this Talking with Talis podcast to discus the approach to open and linked data in the context of the UK Government.

We talk about his role over the years; the emergence data.gov.uk as part of the previous administration’s Making Public Data Public initiative; and the subtle change of emphasis accompanying the new administrations name change to the Transparency Programme.

Finally we move on to the role of the newly formed Public Sector Transparency Board of which he is a member.

One Step at a Time

rjw_caricature_mini I expected some comments to my Data Publishing Three-Step post last week but what I didn’t expect was a virtual pat on the head with an accompanying croon of "Who’s the clever boy, then? You are! Yes, you are!" in a reply post—I’d love to dance with you, but…— from Dorothea Salo.

The problem she identifies in her, politely phrased, complaint about my reductionist approach is this:

Aside from my friends the open scientists (and not even all of them, to be honest), practically all the data-producing researchers I know are firmly stuck on Step 1. Firmly stuck, not to say "immovably." As for Step 2… trust me, these folks are not data modellers. I sincerely doubt my own capacity to teach RDF to someone who approaches me asking, "Is it okay if I record my data in Excel?"

And I totally agree with her.   It would be great in an ideal world if data creators could take all three steps to publish in a linkable queryable form.  But as she identifies, many folk are not data modellers, and wouldn’t want to be.   The three steps I identified are there for motivated people to take as many, or as few, that are compatible with their work and motivations.   All anyone could ask is that they at least have an awareness that others may have sufficient interest and motivation to take their data through the next step.

Starting with getting your data out there, in any form (yes even Excel, if that is your tool of choice), is the foundation.   Without the data in a form that you, and others, could reuse, there is little point.

So expanding on my Step 1. description:

  • Publish your data.
  • Publish it in a way that others can use – in a known format from which you can easily extract the actual data elements (Excel, csv, etc.  not pdf, or word).
  • Publish it with an explanation of what the data is, and where to get it.
  • Publish it under simple unambiguous licensing terms, without ambiguous restrictions such as ‘non-commercial only’.
  • If possible identify things in the data using well known identifiers – not ‘substance_1234′ where H₂SO₄ could be used, or location_abc where Paris, FR would do.
  • If there is not a suitable well known identifier set, create your own but publish that as well.
  • Be consistent, with yourself and others around you – don’t go reinventing wheels.
  • Publish your data!

Whilst taking Dorothea’s point about the difficulty in just convincing some people of the merits in exposing their data, none of this is rocket science and doesn’t mention Linked Data, or offends her longtime RDF scepticism.

Get significant amounts of data out there, and hopefully others will be motivated enough to use it usefully to add value by linking it to other data – maybe that will help demonstrate the worth of steps 2 and 3.

Picture published on Flickr by paraflyer.

The Data Publishing Three-Step

rjw_caricature_mini dance footprints In a conversation with data owners about how they should be publishing their data, it is usually not long before the following question turns up: “So, what do I actually have to do to publish my data?”  Often the conversation then wanders off into a game of buzzword bingo–RDF, RDFa, SPARQL, dereferenceable URIs, triples, content negotiation, open data, Linked Data, end-points, etc.—to be followed by a blank look and the unuttered question "Yes, but what do I actually have to do to publish my data?

In an attempt to simplify the answer to that oft unuttered question, I break things down in to three steps.

Step 1   Get your Data Out – for others to consume
Sounds simple.  Just take the spreadsheet (or similar file) that you use to track information, post it on your web site and link to it from a description posted in an accompanying web page. It can be that simple, but there are things to consider:

  • Licensing – will potential consumers of the data be confident on their ability to use and/or reuse it. (The UK Government are very clear on this)
  • Is it open but opaque? – The terms, codes, identifiers etc. you use may be meaningless, or worse still ambiguous, to those outside your organisation, or even your department.
  • Could your data be made more consistent with other data you, or similar organisations, already publish.

All things to be considered, but not to be put up as excuses for not publishing.

Step 2   Get your Data In – to an open linkable standard format
This is the most powerful step, which consists of identifying the elements in your data (organisations, locations, things, projects, types, etc.) and giving them unique identifiers then make these identifiers web links.  Fortunately this may not be as onerous as it sounds. There are many publicly visible/usable identifiers that you can use for your data – for example:

For this step to be effective you really need to be modelling your data.  Your [first class] data elements, and the relationships between them.  Plus possibly relationships with external entities.  The output of this step will be an RDF representation of your data to Linked Data Principles. You should also identify the process or rules to get from your source data in to this new form, enabling you to repeat for later versions of your data.

Having said all that, it is not necessarily only you that will/can do step 2.   It is perfectly possible for a third party, or a central organisation such as data.gov.uk, or even an enthusiast, to carry out this data modelling and transformation step with data that you have openly published.

Next you need to publish your data so that it can become part of the Web of Linked Data, which brings me, with apologies to fans of the traditional party song, to…..

Step 3   Link it all about
Going through step 2 and not making your data available, or providing useful information at the end of the links you embed in your data, would be a bit of a pointless exercise.  How to publish this data is the next question, to which there are at least three equally valid answers.

  • Using an encoding technique called RDFa, you can embed the RDF data within the html coding of a web page so that software can obtain a more structured representation from a web page than a human, viewing it in a browser would.
  • You could just publish the RDF in rdf files on your web server.  A good example of this is the way the BBC publish the RDF for many of their pages, such as for their Wild Life. The Lion Web page – the RDF for Lion (dependant on your browser, you may need to use it’s view page source option to see the actual RDF encoded in XML)
  • You could store the individual RDF statement (triples) in a triple store, or SPARQL end-point.  This not only publishes the RDF, but also enables the data and relationships within the data to be queried. This is how data.gov.uk publishes RDF, from Talis Platform Stores.   This interface might look a bit cryptic – the results, formatted in XML in the top box, from running the SPARQL query shown in the bottom box – but this is a developers interface demonstrating the code and results an application might use, so you wouldn’t expect much different.

I’ve decided to go through these steps, can you remind me again why?  -  So that your data can be linked with other data to add value to the experience of consumers of your data and services, as well as others using your data to add value elsewhere.  A good example of this in action being the BIS Research Funding Explorer.

Some Clarity on Transparency

rjw_caricature_mini Since the Conservative Liberal coalition replaced the Labour Party, as the UK Government party in power, there has been a question about how the Conservative’s approach to opening up public data would change the "Making Public Data Public initiative", and its influence upon data.gov.uk.

Advisers, Sir Tim Berners-Lee and Professor Nigel Shadbolt, were retained by the incoming administration, who also introduced Tom Steinberg, founder of mySociety, as their man in this area. They also made some pronouncements about using open standards and openly publishing data, but there was not much initial detail behind this.

Last week saw the first meeting of the Public Sector Transparency Board, chaired by Francis Maude, the Minister for the Cabinet office.  He was joined by these three advisers and Doctor Rufus Pollock, from Cambridge University, at the first meeting.

Their first task was to discuss new public data transparency principles, which have been reproduced in a post on the data.gov.uk blog.  These eleven draft public data principles go a long way to reflect the thinking of this group and how they intend to take forward the initiatives of their predecessors.

Key points that attracted my attention include:

  • Release data quickly and then republished in linked data form later on–getting the data out there being the most important step in this process, formats being a secondary consideration.
  • Public data will be available and easy to find through a single easy-to-use online access point–this access point being data.gov.uk.
  • Data will be released under open licenses and in machine readable form, following World Wide Web Consortium recommendations and standards—linked data.

So, on the surface things don’t look that much different to what they did before the government changed—the commitment to publishing data in any format that was useful initially, and then a commitment to move towards making it machine-readable and linkable.

There does seem to be a drive to go further and deeper than their predecessors. Both from the point of view of publishing financial data, and anecdotal evidence of government departments being asked to discover all datasets that they, have that have not yet been published–a bit of a Donald Rumsfeld situation methinks.

Any concerns that changing the name of the initiative from Making Public Data Public, to the Transparency Agenda, would affect the progress of these initiatives seem, from these early draft principles, to be unfounded.  From my point of view  good for open data, good for Linked Data, good for data.gov.uk, good for UK government, and good for all of us.

Picture from Flickr by liber

Making Public Data Public – A Videocast with Richard Stirling

Richard Stirling2 Following on from our podcast conversation with Richard Stirling, Head of Making Public Data Public and data.gov.uk, – based out of the Cabinet Office of the UK Government, in which he explained about the challenges of launching of data.gov.uk, we were asked if it was possible to look behind the scenes at what had been implemented.  Zach Beauvais, and the Talis video camera, were invited in to the Cabinet Office so that Richard could do just that.

Richard walks us through the data.gov.uk site, explaining how data is stored, an accessed.  He explains the [very open] licensing conditions surrounding the data.   It has been well publicised that they have made a commitment to delivering an increasing proportion of that data as Linked Data.  Richard explains the reasoning and benefits of doing that in addition to publishing it in it’s initial raw form.

We get a sneak preview of a new Linked Data API, which will soon provide a simple way to query  data in that form, without the need to understand the SPARQL query language.

Open… and Mobile?

light trailsI know what you’re thinking: “He’s going to say Data!”

Well, I might do at some point, but I was going to say “Days”. Last month, Talis flung open its doors to 30 or so folk who were interested in SPARQL, the Semantic Web and Linked … er, Data. The idea was to host an informal event for folks learn about much of what we’ve been talking about for the past few years. We planned some talks on what it means to join up your data, what this Platform is about, and a detailed introduction to SPARQL. With the launch of data.gov.uk and many of the stories covered over in the Magazine, it seemed possible that people were starting to get interested in this whole Linked Data scene.

So, we sent out some invites and tweeted a bit, and soon had to cap the registration numbers. We filled up spaces in the January day not long after New Year, and the February day not long after the January one. March is quickly filling up too (hint). I have to admit, I wasn’t expecting this many people to express an interest so soon. Not only did people sign up, but travelled to Birmingham through adverse weather to come and take part at both ‘Days—and we’ve had a lot of fun.

One thing that seemed to be a good idea was to ask for feedback before the event. It sounds wrong, but the point of an Open Day is to cover things that YOU’re interested in learning or exploring. So, when people registered, they were asked for their expectations and what they’d like to take away with them from such an event—aside from a T-shirt and SPARQL mug, obviously. It made it much easier to work out what we should cover, and I hope it meant that we were able to talk about the things most relevant to the people who came along.

I’d like to do it again, but slightly differently. Instead of hosting an Open Day here at Talis HQ, what if we came to you? Would you be interested in attending a Talis Platform Roadshow? What would you want us to cover? More importantly, where would you like us to go?

Comments below, or email me or tweet me.

We’re excited

Yay!The Talis offices, for the past few weeks, have been awash with geeky excitement—that kind of near giddy excitement that comes with eager expectation. We’ve all been waiting for something important.

For some, this was no doubt augmented with the announcement of Steve’s new iPad; but that’s not what’s gotten us all worked up.

For months, we’ve been looking forward to the launch of data.gov.uk; and last week, the wraps finally came off. The official press release put it:

A major new website has been launched to the public which gives anyone who wants to use it unprecedented and free access to government data in one place.

This doesn’t quite capture the coolness of the launch, for me. Yes, it’s a major new website, and it’s point is to publish information. But, the exciting thing is that this information is being published as data: data that can be used, reused, remixed and enriched. Sir Tim Berners-Lee’s perspective was more exciting:

Making public data available for re-use is about increasing accountability and transparency and letting people create new, innovative ways of using it. Government data should be a public resource. By releasing it, we can unlock new ideas for delivering public services, help communities and society work better, and let talented entrepreneurs and engineers create new businesses and services.

The point is that this public resource is finally getting a home on the web, and an infrastructure to make it not just available, but useful.

The exceptional team behind data.gov.uk have striven to adhere to web standards in its production: including Linked Data as a priority, as Professor Nigel Shadbolt explained:

We are also going to increase the use of ‘Linked Data’ standards, which allows people to provide data in a way that is as flexible and easy-to-use as possible.

Back in November, Leigh Dodds wrote a post explaining how we’ve been involved, and there’s an official Talis Platform press release too. Basically, we’ve been working with the data.gov.uk team to help with the Linked Data part of the site—hosting the SPARQL endpoints and providing consultancy and training, for example.

I can confidently say that we’re very proud of data.gov.uk, the team behind it, and our involvement with it. We’re excited by the prospect of this data being used as raw material for clever people to make interesting, useful, even world-changing things with it. We’ve seen the beginnings and proof-of-concept projects already.

Now comes the really exciting stuff. What are you going to build?

Image: “Yay for happy days!” by le vent le cri via flickr (CC: By)