Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Author Archive

Linked Data – Coming Together

hannibal To quote John ‘Hannibal’ Smith, from that wonderful bit of 1980s TV, “I love it when a plan comes together!”.   Of course aficionados of the A-Team will probably remember ‘the plan’ was often only apparent in retrospect, although it’s general intention was clear from the start.

The adoption of  Linked Data and the realisation of all that potential benefit, is looking a bit like an A-Team episode – the eventual outcome being clear from the start, but with many setbacks, skirmishes to fight, partners to woo, nerves to calm, and teams to lead on the way.

To break the metaphor at this point, I see Linked Data as more of a shared vision than a plan laid out before us.  Nevertheless, I think we are staring to see elements of it ‘starting to come together’.

One very obvious example, is what Ordnance Survey is doing by continuing to open up their location data.  Now that OS have defined a URI for every UK postcode unit [eg. ‘SO16 4GU’ = http://data.ordnancesurvey.co.uk/id/postcodeunit/SO164GU], why would anyone [re-]publishing data in the future not use these identifiers to reference their postcode information?  By that simple step they will be linked in with a wealth of ancillary information about the location – easting/northing, ward, district, county, country, etc.

Goodwin BIS Great I hear you say, but show me an example of what that could lead to!  Being lazy, I’ll let the inimitable John Goodwin of the OS do it for me.  In his recent appropriately named “So what can I do with the new Ordnance Survey Linked Data?” post, he shows how by merging data from a previous Talis project, produced for the Department of Innovation and Skills, he can deliver a very different way of accessing the same data. 

The BIS Research Funding Explorer project brought together data about UK Government research funding, from several research councils and the Intellectual Property Office, and brought them together in a Linked Data driven application to display UK centres of research excellence. 

John explains how by mixing Linked Data, published for that project, with OS Linked Data, he has been able to develop a different way of accessing the data.  In his, prototype, application you are presented with a map of the UK showing the regions as defined by the European Union.  By clicking on one of the EU regions you are presented with a list of the projects from within that area.  He has also added the ability to access by county or District/Unitary Authority. A simple, but effective, way of demonstrating that data, in Linked Data form, from one source can be easily combined with data from another source to deliver benefit.

Of course even with this example we are seeing the effect of joining just a couple of jigsaw pieces together.  With Linked Data, such as this from OS, being published at an ever increasing rate, it will not be long before a bigger picture starts to form as more and more data pieces are linked together.

I love it when you can see a plan coming together!

Dion Hinchcliffe – Web and Social in the Enterprise

Dion Hinchcliffe Online Information 2010, The opening conference keynote presentation this year comes from Dion Hinchcliffe, Senior Vice President of Dachis Group.  Dion is an internationally recognized business strategist and enterprise architect with an extensive track record of building enterprise solutions and strategies for clients in the Fortune 500, federal government, and Internet start-up community.

In this conversation we explore the impact of web and social technologies and their impact, challenge, and opportunity when applied to the enterprise.

Focus on Local Government Spending

The UK Government Transparency agenda is encouraging Local Government as well as National Government to publish its data as Open Data and Linked Data, reflecting the world leading progress that data.gov.uk has made on these fronts over the last year and a bit.

I am sat in the opening session of Socitm 2010 conference, in sunny Brighton, whilst writing this.  Already it is clear that local government spending is a major issue for the sector.  In it’s broad sense, of how much local authorities can [or cannot] spend

, it is providing the background for the whole conference.  Not doom and glom here though.  IT could be seen as a knight in shining armour  to help the public sector deliver better services what the encouraging thought proffered by Louisa Preston as she launched the day.  In its more narrow sense, the requirement to publish data about all local government spending items over £500 from January 2011 onwards, it gives a focused example of the opportunity for a significant change in thinking and practice by the sector.

As Nodalities readers are well aware, Linked Data tools, techniques, and technologies have massive potential to simplify the publishing, linking, aggregating, and making data work across a web of data.  It is no coincidence that data.gov.uk is making steady valuable progress publishing key data sets in linked data form in the Talis Platform – it is an obvious step.  For many in local government, linked data is something they have never met before.   For them the, traditionally unnatural, step of openly publishing what in the past would have been a private report out of the back of their finance system, is a significant step in itself.

It is the responsibility of those of us, who understand the benefits of taking the extra step beyond just publishing a simple csv file to publish in Linked data form, to make it easy for all authorities to understand and take the combined step of publishing Linked Data from the start.

To that end, we at Talis recently announced a free stores offer for all UK local authorities to publish their spending data as Linked Data.

Traditionally our approach would be host a free open day to help those in local government understand Linked Data and the benefits to them.  Recognising the broader economic climate, and its influence on local government spending in that broader sense, that doesn’t seem to be a good idea.

LGID Many organisations, not least Socitm (there is a Linked Data session at the conference today) and the Local Government Group, in the sector are looking to promote this approach.  We are therefore going to work with the sector to promote this message.

To that end we are to participate in the Open Data strand of the free Local by Social online conference, 3 – 9 November being hosted by LGID. 

As well as checking out, what looks to be a quality online event, stay tuned to the Talis initiatives in this area.

Linked Open Data and Pavlova

rjw_caricature_mini If Sir Tim Berners-Lee can equate Linked Data with a packet of  crisps/potato chips, I thought I would take a stab at another food metaphor for this post. 

Linked Open Data (LOD) is a concept that many believe they understand.  Take yourself to most any conference that has a connection with data, or the web, or the Internet at the moment, and it will not belong before you see a slide of the Linked Open Data cloud diagram, or of Sir Tim imploring us to give him our raw data now, or if you are very lucky a shot of him doing his imploring whilst stood in front of a shot of the LOD cloud.  -  Simple really, just publish your data as Linked Open Data and all will be wonderful as we move towards the sunlit Semantic Web uplands.  Unfortunately life is never that simple – LOD is not a single identifiable thing.  As Paul Walk eloquently puts it:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

As with any recipe for success, the majority concentrate on the final result.  Praising or criticising it as a whole, without identifying the benefits or otherwise, of the individual ingredients.  Take a strawberry pavlova for instance.  If you you are in to that kind of thing, a delightful culmination of the culinary arts designed to send your taste buds in to raptures.  Unless that is, you don’t like cream, or you don’t like strawberries, or can’t abide meringue, in which case the whole thing seems a little pointless.

What has this got to do with Linked Open Data (LOD), I hear you ask.  Well, I am increasingly seeing LOD being presented as the goal for those wishing to publish their data on line.  My position is that the eventual goal, from which will spring a Semantic Web, is a global web of linked and open data. However, there are many steps from where we are now to achieving that goal.  Within audiences that I present to, and/or sit amongst, I see people who for whatever reasons do not ‘get’ one or more of the components of LOD – they cannot envisage opening up any of their data, or think that using a web address for an identifier is over complex, or have a religious aversion to RDF.  As a result they dismiss the whole recipe as not for them, or worse still, as something impractical that will become nothing more than the plaything of a few passionate enthusiasts.

When someone who is still struggling with the concept of opening up their organisation’s data; or why RDF might be a more useful format than csv, is shown the ubiquitous Linked Open Data cloud diagram with encouragement to join in – it is hardly surprising they remain a little unconvinced.  This isn’t a criticism of presenters either.  In only 20 minutes on a stage, it is difficult to go into underlying detail.

Let my try in a few paragraphs to break the LOD pavlova in to it’s ingredients

  •  Data – In the context of  this post, by data I mean machine readable information, produced in a format that can be consumed and processed by other machines.  Inevitably, this means file formats such as csv, XML, RDF, etc. , but not something like pdf, html, or word, which although they are in a transferrable format it is designed for human consumption not machine analysis.

    For some, just this step from their current human targeted format, to a machine readable one, is a significant one.

  • Open Data  – Data (see above) which is accessible for all to download, view, and consume in a way that is not encumbered by licensing that restricts its use.  For example, the licensing used by data.gov.uk data.  By definition data which is restricted for certain uses is not fully open.  

    In our internet based world, openness can also be defined in terms of technical accessibility.  If it is only available after a login process, or it is only available to users behind a firewall, it couldn’t be considered as open. 

  • Linked Data – Data (see above) which contains URIs as identifiers for concepts described in the data and URIs to identify the relationships between those concepts.  The four Linked Data Principles, as published as a design note by Tim Berners-Lee, provide a bit more detail on this.

    I am in danger of stirring the embers of a religious fire fight here, between those that believe that Linked Data must be described in RDF and contain URIs as identifiers, and those that maintain that you can have data linked across the web without those constraints.  All I am going to say on that at this time, is that the Linked Open Data cloud of data sets has been successful, based on the first of those two views. (if you want to follow that particular debate in more detail, Paul Miller’s post and associated comments would be a good starting point)

So, how can data be open, but not linked? – by publishing in in a non-Linked Data form such as a text file or a html page or a pdf.  Where would you find this? – all over the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to consume.

How can data be Linked but not open? – by publishing it in accordance with the principles, in RDF, with URIs, but restricting access either by imposing restrictive licensing conditions or restricting access to the data.  Where would you find this? – again all over the web, but often hiding behind restrictive licensing terms such as “non-commercial use only”.  Also to be found inside organisational firewalls.  For example, commercial organisations can realise the benefits of  using Linked Data techniques with their internal private data.  Potentially linking it to publicly visible concepts across the web to add even more value for their employees.

Data that is Linked and Open, like that strawberry pavlova, has the power to deliver value beyond the sum of its individual ingredients.  By providing data in a form that is linked to other data, and easy for others to link to, without restrictions on who or how that linking takes place, provides the foundation for a web of linked data built on the same principles that fostered the growth of the web of documents that has so changed our world over the last decade and a half.

The ingredients that formed that World Wide Web of documents – html, http, open publishing of web sites without restrictions on other’s abilities to consume and/or link to them – individually  were important developments.  However, when those elements were blended together their effects were multiplied many fold and resulted in the web we experience today. 

So [as I stretch my culinary metaphor to it’s limits] if you are hoping to take people with you in building a Linked Open Data future, you not only have to show them a picture of the final dish, you need to describe the individual ingredients and their relevance to the eventual result.

Pictures from Flickr by PhOtOnQuAnTiQuE and avixyz

A conversation about The Interactive Knowledge Stack

wernher_behrendt John_periera1 My guests on this Talking with Talis podcast are Wernher Behrendt  and John Pereira of Salzburg Research.  They are part of the team behind IKS – The Interactive Knowledge Stack an Integrating Project part-funded by the European Commission.

The four year project started in January 2009 to provide an open source technology platform for semantically enhanced content management systems.  The concept behind it being, that once developed, the stack can be bolted-on to many different CMS products to add semantic, and semantic web, capabilities.  Even though the project is open source, and the obvious use of it is with open source CMS tools, it’s use could be of equal value to commercial products.

 

Their target is engage with 40 small to medium organisations for whom developing such capability would not be possible with their limited resources.  They are already well on the way, with many joining in via the project Web site and participating at the first early adopters workshop in Salzburg in June.

Technorati Tags: ,,

Linked Data and Libraries – almost like being there

The room was almost full at the British Library Conference Centre for the Linked Data and Libraries event on 21st July 2010, and many who wanted to attend couldn’t because of distance, other commitments, etc.

We therefore took along our brand new screen grabber device and a video camera to capture as much of the day as we could.  We have completed the editing process so I am ready to share the videos for those that want to view, or remind themselves of, the day.

Like most of the content we produce at Talis, these videos are licensed under a Creative Commons Attribution License, so share and enjoy.

ZackBeauvais

Introduction Talis and the world of Linked Data
Zach Beavais, Talis

Click for presentation video

 

 

romain_wenz.jpg

The data.bnf.fr Project
Romain Wenz, Bibliothèque nationale de France

Presentation not yet available

 

 

 rob_styles

Linked Data, RDF, and SPARQL
Rob Styles, Talis

Click for presentation video

 

 

 RJW Cropped Podium

Linked Data in Action
Richard Wallis, Talis

Click for presentation video

 

 

 neil_wilson

Lightning Talk
Neil Wilson, The British Library

Click for presentation video

 

 

 sally_chambers.jpg

Lightning Talk
Sally Chambers, The European Library

Click for presentation video

 

 

 felix_ostrowski.jpg

Lightning Talk

Felix Ostrowsk, The North Rhine-Westphalian Library Service

Click for presentation video

 

 

 rob_styles

Linked Bibliographic Data
Rob Styles, Talis

Click for presentation video

 

 

 antoine_isaac.jpg

W3C Library Linked Data Incubator Grouop
Antoine Isaac, Scientific Coordinator, Europeana

Click for presentation video

 

 RJW Cropped Podium

An overview of the Talis Platform

Richard Wallis, Talis

Click for presentation video

 

Technorati Tags: ,,

Linked Data in Libraries – Presentations

The Talis Linked Data in Libraries event, held at the British Library in London on Wednesday 21st July was attended by 50 enthusiastic interested people interested in the topic.

Below you will find presentations from the day.

Introduction Talis and the world of Linked Data – Zach Beavais, Talis 
Click to play

The data.bnf.fr Project – Romain Wenz, Bibliothèque nationale de France
        (Presentation not yet available)

Linked Data, RDF, and SPARQL – Rob Styles, Talis
Linked Data, RDF & SPARQL
Click to play

Linked Data in Action – Richard Wallis, Talis
In Action
Click to play

Lightning Talks:
                       Neil Wilson, The British Library NielWilson

                       Sally Chambers, The European Library Chambers_Talis_linked_data
                       Felix Ostrowsk, The North Rhine-Westphalian Library Servicehbz_master

Linked Bibliographic Data – Rob Styles, Talis
Linked Bibliographic Data
Click to play

W3C Library Linked Data Incubator Group – Antoine Isaac, EuropeanaIsaac-LLD10
Click to play

An overview of the Talis Platform – Richard Wallis, Talis
RJW- Platform Overview
Click to play

Watch this space for videos of some of the sessions.

Tom Steinberg talks about the Public Sector Transparency Board

rjw_caricature_mini Tom Steinberg Tom Steinberg of mySociety fame joins me on this Talking with Talis podcast to discus the approach to open and linked data in the context of the UK Government.

We talk about his role over the years; the emergence data.gov.uk as part of the previous administration’s Making Public Data Public initiative; and the subtle change of emphasis accompanying the new administrations name change to the Transparency Programme.

Finally we move on to the role of the newly formed Public Sector Transparency Board of which he is a member.

One Step at a Time

rjw_caricature_mini I expected some comments to my Data Publishing Three-Step post last week but what I didn’t expect was a virtual pat on the head with an accompanying croon of "Who’s the clever boy, then? You are! Yes, you are!" in a reply post—I’d love to dance with you, but…— from Dorothea Salo.

The problem she identifies in her, politely phrased, complaint about my reductionist approach is this:

Aside from my friends the open scientists (and not even all of them, to be honest), practically all the data-producing researchers I know are firmly stuck on Step 1. Firmly stuck, not to say "immovably." As for Step 2… trust me, these folks are not data modellers. I sincerely doubt my own capacity to teach RDF to someone who approaches me asking, "Is it okay if I record my data in Excel?"

And I totally agree with her.   It would be great in an ideal world if data creators could take all three steps to publish in a linkable queryable form.  But as she identifies, many folk are not data modellers, and wouldn’t want to be.   The three steps I identified are there for motivated people to take as many, or as few, that are compatible with their work and motivations.   All anyone could ask is that they at least have an awareness that others may have sufficient interest and motivation to take their data through the next step.

Starting with getting your data out there, in any form (yes even Excel, if that is your tool of choice), is the foundation.   Without the data in a form that you, and others, could reuse, there is little point.

So expanding on my Step 1. description:

  • Publish your data.
  • Publish it in a way that others can use – in a known format from which you can easily extract the actual data elements (Excel, csv, etc.  not pdf, or word).
  • Publish it with an explanation of what the data is, and where to get it.
  • Publish it under simple unambiguous licensing terms, without ambiguous restrictions such as ‘non-commercial only’.
  • If possible identify things in the data using well known identifiers – not ‘substance_1234′ where H₂SO₄ could be used, or location_abc where Paris, FR would do.
  • If there is not a suitable well known identifier set, create your own but publish that as well.
  • Be consistent, with yourself and others around you – don’t go reinventing wheels.
  • Publish your data!

Whilst taking Dorothea’s point about the difficulty in just convincing some people of the merits in exposing their data, none of this is rocket science and doesn’t mention Linked Data, or offends her longtime RDF scepticism.

Get significant amounts of data out there, and hopefully others will be motivated enough to use it usefully to add value by linking it to other data – maybe that will help demonstrate the worth of steps 2 and 3.

Picture published on Flickr by paraflyer.

The Data Publishing Three-Step

rjw_caricature_mini dance footprints In a conversation with data owners about how they should be publishing their data, it is usually not long before the following question turns up: “So, what do I actually have to do to publish my data?”  Often the conversation then wanders off into a game of buzzword bingo–RDF, RDFa, SPARQL, dereferenceable URIs, triples, content negotiation, open data, Linked Data, end-points, etc.—to be followed by a blank look and the unuttered question "Yes, but what do I actually have to do to publish my data?

In an attempt to simplify the answer to that oft unuttered question, I break things down in to three steps.

Step 1   Get your Data Out – for others to consume
Sounds simple.  Just take the spreadsheet (or similar file) that you use to track information, post it on your web site and link to it from a description posted in an accompanying web page. It can be that simple, but there are things to consider:

  • Licensing – will potential consumers of the data be confident on their ability to use and/or reuse it. (The UK Government are very clear on this)
  • Is it open but opaque? – The terms, codes, identifiers etc. you use may be meaningless, or worse still ambiguous, to those outside your organisation, or even your department.
  • Could your data be made more consistent with other data you, or similar organisations, already publish.

All things to be considered, but not to be put up as excuses for not publishing.

Step 2   Get your Data In – to an open linkable standard format
This is the most powerful step, which consists of identifying the elements in your data (organisations, locations, things, projects, types, etc.) and giving them unique identifiers then make these identifiers web links.  Fortunately this may not be as onerous as it sounds. There are many publicly visible/usable identifiers that you can use for your data – for example:

For this step to be effective you really need to be modelling your data.  Your [first class] data elements, and the relationships between them.  Plus possibly relationships with external entities.  The output of this step will be an RDF representation of your data to Linked Data Principles. You should also identify the process or rules to get from your source data in to this new form, enabling you to repeat for later versions of your data.

Having said all that, it is not necessarily only you that will/can do step 2.   It is perfectly possible for a third party, or a central organisation such as data.gov.uk, or even an enthusiast, to carry out this data modelling and transformation step with data that you have openly published.

Next you need to publish your data so that it can become part of the Web of Linked Data, which brings me, with apologies to fans of the traditional party song, to…..

Step 3   Link it all about
Going through step 2 and not making your data available, or providing useful information at the end of the links you embed in your data, would be a bit of a pointless exercise.  How to publish this data is the next question, to which there are at least three equally valid answers.

  • Using an encoding technique called RDFa, you can embed the RDF data within the html coding of a web page so that software can obtain a more structured representation from a web page than a human, viewing it in a browser would.
  • You could just publish the RDF in rdf files on your web server.  A good example of this is the way the BBC publish the RDF for many of their pages, such as for their Wild Life. The Lion Web page – the RDF for Lion (dependant on your browser, you may need to use it’s view page source option to see the actual RDF encoded in XML)
  • You could store the individual RDF statement (triples) in a triple store, or SPARQL end-point.  This not only publishes the RDF, but also enables the data and relationships within the data to be queried. This is how data.gov.uk publishes RDF, from Talis Platform Stores.   This interface might look a bit cryptic – the results, formatted in XML in the top box, from running the SPARQL query shown in the bottom box – but this is a developers interface demonstrating the code and results an application might use, so you wouldn’t expect much different.

I’ve decided to go through these steps, can you remind me again why?  -  So that your data can be linked with other data to add value to the experience of consumers of your data and services, as well as others using your data to add value elsewhere.  A good example of this in action being the BIS Research Funding Explorer.