Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

data.gov.uk and the Talis Platform

Earlier this year Gordon Brown appointed Tim Berners-Lee as an advisor to the Cabinet Office to help the government begin the process of opening up its data. This was one part of the initiation of a project to begin opening up UK government data in a similar style to the US. A key part of Berners-Lee’s vision for putting government data online has been Linked Data which promises to provide a much richer way for citizens to begin accessing, browsing, and using government data.

Several other governments have begun opening up data assets including Australia and New Zealand. These approaches mirror that of the US data.gov site, providing a browsable directory of datasets and links to raw data downloads in a range of different formats. The preview launch of data.gov.uk which was announced at the end of September also includes a directory of datasets which is powered by the software underlying the Comprehensive Knowledge Archive Network. But the site also aims to fulfill Berners-Lee’s vision and in addition provide access to some datasets as Linked Data through SPARQL endpoints.

We’re very pleased to report that the Talis Platform is currently underpinning the delivery of all of the Linked Data and SPARQL endpoints for the data.gov.uk site.

We’ve been quietly supporting the effort for several months now helping out with data management, modelling discussions, and with training on the core technology. There seems to be a very definite appetite in government to not only open the raw data but to also explore the potential for Linked Data. Its clear from today’s announcement about opening up additional aspects of the Ordnance Survey data that there’s a real focus on delivering on the open data promise. While there are certainly some high-profile datasets like the Ordnance Survey or postcode data that may require legislative changes to become open, one of the biggest implementation challenges facing government is pulling together an overall directory of datasets and spreadsheets that are already scattered across multiple departmental websites.

Creating a dataset directory provides the required basic level of infrastructure to allow reuse, by enabling developers to find what they need; publishing Linked Data, SPARQL endpoints, and potentially extra APIs provides an additional set of options for ways to access the data. By letting datasets be browsable by anyone, not just developers, Linked Data offers the potential for anyone to find, discover and reuse interesting datasets. As I illustrated in a recent talk, these approaches are not mutually exclusive and the goal should be maximum utility.

Over on the Talis Platform developer blog we’ve begun showing some ways that the initial datasets, covering UK schools and traffic measurements can be queried in interesting ways. Its been exciting to see people begin to pick up the technology and creating reporting tools to explore the data, but also fantastic to be able to easily view data using only a browser.

There’s clearly still a great deal of work ahead, but the ground work has now been completed: there’s infrastructure in place to support data publishing; official guidelines on creating public sector URIs; and some agreement on best practices for modelling statistical data. The next challenge is to start ramping up the conversion of currently open data into RDF, in order to begin expanding the coverage of the Linked Data.

This is a very exciting project and here at Talis it’s something in which we’re very proud to be playing a role.

Tom Steinberg talks about mySociety and public data

In my latest podcast I talk with Tom Steinberg of UK-based mySociety.

We discuss mySociety’s approach to promoting transparency in Government, and consider some of their popular projects before exploring Tom’s views on moves by Government to make more of its data available for use and reuse.

 
 Standard Podcast [28:08m]: Play Now | Play in Popup | Download (503)

During the conversation, we refer to the following resources;

This conversation was recorded on Thursday 17 September, 2009.

For other Talis podcasts in this Nodalities series, see here

Kevin Merritt talks about Socrata and Government Data

In my latest podcast I talk with Kevin Merritt, CEO of Seattle-based Socrata.

We discuss the company’s approach to ’social data discovery,’ and consider ways in which these techniques might be applied to the wealth of data emerging from Government in the United States and elsewhere.

 
 Standard Podcast [44:27m]: Play Now | Play in Popup | Download (428)

During the conversation, we refer to the following resources;

This conversation was recorded on Wednesday 16 September, 2009.

For other Talis podcasts in this Nodalities series, see here

David James talks about Government transparency and the work of Sunlight Labs

Sunlight Labs logoIn my latest podcast I talk with David James of Sunlight Labs, part of the Sunlight Foundation in Washington, DC.

We discuss the Labs’ work to increase Government transparency by making public sector data such as that disseminated via Data.gov more useful.

 
 Standard Podcast [43:09m]: Play Now | Play in Popup | Download (515)

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 14 August, 2009.

For other Talis podcasts in this Nodalities series, see here

Jim Hendler and Li Ding talk about work to convert Data.Gov resources to RDF

tw-dataIn my latest podcast I talk with Jim Hendler and Li Ding of the Tetherless World Constellation at Rensselaer Polytechnic Institute in Troy, New York.

We discuss work that they and colleagues have been undertaking to convert chunks of the US Federal Government data released via the data.gov portal to RDF.

 
 Standard Podcast [57:25m]: Play Now | Play in Popup | Download (682)

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 7 August, 2009.

For other Talis podcasts in this Nodalities series, see here

Talking with David Eaves about Open Data and Open Government in Vancouver

In my latest podcast I talk with David Eaves about a recent initiative by the Canadian city of Vancouver. The May Motion, of which David was a co-author, calls upon the city to embrace Open Source and Open Standards, and to make much of the city’s data Openly available for use and reuse. We discuss the background to the Motion, and consider some of the uses to which municipal data might usefully be put.

 
 Standard Podcast [45:28m]: Play Now | Play in Popup | Download (403)

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 31 July, 2009.

For other Talis podcasts in this Nodalities series, see here

RDFa and Linked Data in UK government web-sites

By Mark Birbeck

| This article will feature in Nodalities Magazine, Issue 7

The UK government’s Central Office of Information had a straightforward problem to solve: how could they create a centralised web-site of information that the public could search and access, when the source of that information could be any government department
database or any public sector web-site?

For example, different organisations, such as Her Majesty’s Revenue and Customs (HMRC) or the National Health Service (NHS) would each post job vacancies to their own web-sites, but there was no central site that the public could go to, to find all public sector vacancies. This would be a problem at any time, but in the midst of attempts by the government to help people through the recession, it’s crucial to ensure that the public knows what vacancies are available. It might not occur to someone looking for a job as a plumber or an electrician they they should visit the NHS or Army web-sites, so a centralised site could make a big difference.

civil-service-vacancy

Similarly, as in most modern democracies, government departments are constantly seeking feedback from the public and interested parties, about specific issues. But as with job vacancies, these consultations are on departmental sites, rather than being available on a central site; from the Department of Energy and Climate Change (DECC) seeking feedback on clean coal, to the Ministry of Justice (MOJ) providing an opportunity for people to comment on prisoners’ voting rights, each department manages its own publication of consultations.

Traditional solutions

Traditional answers to these problems would have been to either (a) impose on each of the departments that they should key their data directly into a new central database (which would in turn drive the central web-site), or (b) create complex communication pipelines that would allow the decentralised databases to communicate with the central system.

And either of these solutions would almost certainly have turned out to have been a non-starter.

The first solution was unlikely to ever get off the ground, because it would have required each department to replace their existing technology with something new. Even if there was agreement on what that technology should be—and that in itself could take an age to resolve—there would have been a need for new development work, retraining of users, porting data from older systems, and so on.

The second ‘traditional’ solution at least has the merit of keeping existing systems intact, but would have required additional interfaces to be created to move the data from the departmental servers to the centre; each department would have had to create an interface between their own system and the central one.

Just getting one department into a situation where they could centralise their information would have been a major undertaking—not only were there lots of departments to consider, but each department was using a different technology to publish their vacancies or consultations to the web. For example, some departments with only a small number of job vacancies would likely use static HTML pages. Other departments, perhaps with larger IT departments, might use ASP.NET or a Java-based system.

Enter RDFa

The RDFa answer to this set of problems is simple—both conceptually, and to implement.

RDFa allows HTML publishers to embed RDF into their pages, so using the HTTP and HTML infrastructure to publish their information. This simple method of publishing data in turn means that any system can import this data, just by obtaining (or creating) an RDFa parser.

In short, each department can keep their own data management system, and simply add code to their existing web-page publishing step to augment the HTML with the data as RDFa. The central system in turn only needs one import mechanism—something that understands RDFa.

Adding this facility to an individual departments publishing system proved to be very quick and straightforward. But it’s not just UK government departments that are finding it straightforward to add RDFa to their pages. It was interesting to hear at SemTech in June that Google’s rich snippet launch partners (such as Yelp), were able to add RDFa support in “roughly a day”.

RDF publishing techniques

Adding data to web-pages might seem quite an obvious technique, but there are two important things to note here.

First, the COI has to be commended for having the vision to publish RDF at all. Of course, now that Gordon Brown has asked for Sir Tim Berners-Lee’s help in making government data publicly available, it seems pretty obvious—indeed it may even become fashionable! But the COI were planning this project at least a year ago, and at that time RDF was by no means a done deal (and you could say it’s still not).

But the second important thing is that even after deciding to publish RDF, it’s still not immediately obvious that the solution should involve RDFa, especially not a year ago.

The usual means of publishing RDF is to provide a distinct source of data in the form of RDF/XML (and perhaps other formats, too, such as N3). If there is an HTML version it usually exists for the purpose of describing the data itself. In other words, the RDF/XML format is primary, which means that anyone who is publishing HTML pages but wants to publish RDF as well, will need to add an extra piece of infrastructure that exists alongside their web-pages.

RDFa turns this on its head, and says that the HTML page is the data. One and the same page can be read as an HTML page, or as an RDF page, which in turn means that the changes required to the existing publication system are minimal. The COI once again showed its far-sightedness by adopting this technique.

Turtles all the way down

searchmonkey-fcoBut the benefits of RDFa don’t just stop there. Firstly, because the data is being published via HTTP and HTML, it’s possible for anyone to read the same data, not just the centralised web-site that was being planned. This means that third party job vacancy sites, for example, could import vacancies from relevant departments, to add to their databases. In fact, one of the main drivers for the consultations project was to try to help improve the accuracy of an already existing web-site (set up by a member of the public) that used ’screen-scraping’ to try to keep up with the available consultations—RDFa provides much more accurate information.

rdfa-in-govIn addition, the centralised web-site will not only import RDFa but publish it too. This means that third-party servers are also able to import some or all of the centralised data, into their own sites.

And thirdly, by using RDFa the sites could provide information to search applications such as SearchMonkey.

As more servers both consume RDFa from one set of servers, and publish RDFa again to a variety of other servers, we enter the exciting world of Linked Data, and it’s ‘turtles all the way down’.

Conclusion

By using RDFa to address the challenge of making distributed data available in one place, the COI avoided having to make changes to each department’s systems. But once each department is publishing RDFa, it becomes possible for third parties to consume that information however they see fit. Such a flexible architecture is crucial in the age of open government, and is a cornerstone of linked open data.

Mark is managing director of Backplane Ltd. (http://webBackplane.com/), a London-based company involved in a number of RDFa/linked data projects for UK government departments. He is the original proposer of RDFa.

Talking with John Sheridan about e-Government, Open Data and Linked Data

In my latest podcast I talk with John Sheridan, Head of e-Services at the UK Government’s Office of Public Sector Information (OPSI). John is also co-chair of the World Wide Web Consortium’s e-Government Interest Group, and we discuss both roles in the context of current enthusiasm for making Government data more readily available online.

 
 Standard Podcast [37:16m]: Play Now | Play in Popup | Download (561)

During the conversation, we refer to the following resources;

This conversation was recorded on Wednesday 22 July, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

Talking with Phase2 Technology about Drupal, semantic technologies and opportunities in Government

In my latest podcast I talk with Jeff Walpole, Frank Febbraro and Irakli Nadareishvili of Washington-based Phase2 Technology. We discuss the company’s work with open source solutions such as Drupal, and explore their efforts to integrate semantic technologies including Thomson Reuters’ Open Calais web service into the widely deployed content management system. Finally, we discuss the growing opportunity to make Government data more usefully available via these tools.

 
 Standard Podcast [35:11m]: Play Now | Play in Popup | Download (542)

During the conversation, we refer to the following resources;

This conversation was recorded on Tuesday 14 July, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.