Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Linked Data and Libraries 2011 – Agenda Finalised

The Linked Data and Libraries 2011 event to be held at the British Library in London on July 14th is to be opened with a Keynote from Dame Lynne Brindley, British Library Chief Executive.

With reports from the LOD-LAM Summit, W3C Libraries Linked Data Working Group, plus an insight in to bibliographic linked data modelling intriguingly called The Record is Dead, this is looking like a not to miss event.

For full agenda and to register early to guarantee your place, check out the event site.

Lightening Talk slots available I am still taking submissions for the available lightening talks.  Drop me a line before June 17th if you would like to propose a talk.

Image from a photo on Flickr by Fuzzyyol

Talis Linked Data Open Day – USA

2304125531_de22f1cfce_m Whenever we publicise one of our Linked Data events we regularly hold in the UK, I always get a handful of responses wishing that we would run such an event on the American side of the planet.

So guess where our next Open Day is going to be held – the photo might give you a clue.

Check if you are right, and find more details from our Consulting blog. 

Photo Creative Commons licensed from Rob Styles Flickr Photostream

Talis Sponsor Pan-European Open Data Challenge

opendatachallenge We are proud to be a Lead Sponsor for the Open Data Challenge being coordinated by Jonathan Gray from the Open Knowledge Foundation and Paul Meller from the Open Forum Academy, under the auspices of the Share PSI initiative.

This is a significant competition, with significant prizes totalling €20,000 for ideas, applications, visualisations and datasets – up to €5,000! 

As you would expect from a Talis Sponsored competition, Linked Data features in the line up of attributes that entrants should be considering.   Following the 5 Star Data principles espoused by Sir Tim Bereners-lee, the more machine readable, non-proprietary formatted, and linked that Open Data can be, the lower is the barrier to its innovative use.   This is especially true in the area of Public Sector Information, with similar or associated data is being published by several organisations or governments.  In recognition of this we are, as part of our sponsorship, backing the Talis Award for Linked Data – €1,000 presented for the best use of Linked Data in any of the competition categories.

The competition will run for 60 days, so get your ideas flowing, and developers fingers rattling over those keys.

Watch out for a later post, when I will  identify some Linked Open Data that is already available that you could use to build an entry.

Linked Data and Libraries 2011 – July 14th

bl1 After the great success of Linked Data and Libraries 2010 we are doing it again!

Linked Data and Libraries 2011 will be held at The British Library in London on Thursday July 14th.  Again it will be a free event, with limited spaces allocated, so register early.

The agenda is yet to be finalised, but as per 2010 it will be a mixture of general Linked Data overviews & experience, and library Linked Data speakers.  We hope to hear from the British Library, W3C Library Linked Data Incubator Group, LOD-LAM Summit, and others. We are also hoping to find time for the 10 minute lightening talks slot, that worked so well last time.

Register early and/or if you would like to propose a topic or speaker, email me – richard.wallis@talis.com.

Image from a photo on Flickr by Fuzzyyol

Are We Getting A Right to Data?

Friday night – nothing on the TV – I know! I’ll browse through the Protection of Freedoms Bill, currently passing through the UK Parliament. Sad I know, but interesting.

Government spending data published %007C Number10.gov.uk Lets scroll back in time a bit to November 19th 2010 and a government press conference introduced by a video from Prime Minister David Cameron.  The headline story was about the publishing of government spending and contract data, but towards the end of this 109 second short he said the following:

… the most exciting is a new right to data. Which will let people request streams of government information and use it for social or commercial purposes.  Take all this together and we really can make this one of the most open, accountable and transparent governments there is.  Let me end by saying this. You are going to have so much information about what we do, how much of your money we spend doing it, and what the outcome is.  So use it, exploit it, hold us to account.  Together we can set a great example of what a modern democracy aught to look like. (my emphasis)

Obviously to realise this Right to Data there needs to be some legislation, which brings me to the Protection of Freedoms Bill.  This is one of those bills which covers all sorts of issues, from rules for destruction of fingerprints and DNA profiles, CCTV camera regulations, detention of terrorist suspects, to freedom of information and data protection.  Zooming in on the bits on the topic of the release and publication of datasets held by public authorities, we find a set of clauses that amend the Freedom of Information Act 2000

Re-use

After some amendments which allow for datasets and provision in electronic form we get this: “the public authority must, so far as reasonably practicable, provide the information to the applicant in an electronic form which is capable of re-use.”  Unfortunately there is no definition of the term re-use.  It could be argued that a pdf of some tables in a MS Word document could be re-used, where as I believe the spirit of the legislation should be made more explicit to by identifying non-proprietary data formats.  I know this would be a tricky job for the parliamentary draftsmen, as we would not want to restrict it to things, such as XML and csv, that could age and be replaced by something better which then could not be used as it had not been mentioned in the legislation, but I believe that just using the term ‘re-use’ is far too woolly and open to [mis]interpretation.

What is [not] a dataset

This is one of the areas that raises most concern for me. Checkout this wording from the Bill:text1 I am OK with (a) – data collected as part of an authority doing it’s job – and (c) – don’t change the data you have collected – publishing that raw data is important.  However (b) specifically excludes data that is the product of analysis.  Presumably analysis of collected data is one significant way that an authority measures the outcomes of its efforts.  Understanding that analysis will help understand the subsequent decisions and actions they make and take.  I assume that there may be some specific reasons that underpin this blanket exclusion of analysis data.  If there are, they should be identified, instead of generally throttling the output of useful data that will go a long way to helping with Mr Cameron’s stated ambition for us to be able to see “what the outcome is” of the spending of public money.

Release of datasets for re-use

This is a whole new section (11A)  to be added to the 2000 act to cover the release of datasets. It covers ownership, copyright, and/or database right of the information to be published and states that it should be published under “the licence specified by the Secretary of State in a code of practice issued under section 45”. Section 45 basically puts in to the hands of the Secretary of State the definition of the license(s) data should be published under.  As of today the Open Government Licence for public sector information is what is wanted to keep the publishing of information open.  However, what is there to stop a future Secretary of State, who has a less open outlook in replacing it with far more restrictive licences?  Do we not need some form of presumption of openness being attached to the Secretary of States powers as part of this change in legislation?

On the topic of presumptions of openness, the wording of this bill contains phrases such as “unless the authority is satisfied that it is not appropriate for the dataset to be published” and “where reasonably practicable”.  It is clear that many in the public sector are not as enthusiastic about publishing data as the current government position and such vague phrases as these may well be unreasonably used by some in justifying a throttling of the stream of information.   They could easily be used to build in a bureaucratic decision hurdle for each dataset to have to jump, proving its appropriateness and practicality, before publication.  I am sure that it would not be beyond a parliamentary draftsman’s skill to produce wording that means that all will be published, unless a specific objection is raised for an individual dataset, for reasons of excessive effort or data protection reasons.

Up-dated data

Data published by an authority should be published under a scheme, the following applies here:Protection of Freedoms Bill (HC Bill 146)How should we interpret “any up-dated version held by the authority of such a dataset”? My interpretation is that once a dataset has been published is shall continue to be published as it changes.  The precedent for this is spending data – having published authority spending for January 2011, authorities should be automatically publishing it for February and following months.  But what if, in response to a request, an authority publishes the contents of a spreadsheet used to track the amount of salt applied to roads in its area during winter 2010-11 and then uses a different spreadsheet for the following winter.  Does the output of that new spreadsheet constitute a new dataset, or an up-date to it’s predecessor?  From the wording in the Bill it is not clear.

Who does it cover?

I probably need a bit of help here from those that understand the public sector better than I do, but I am suspicious that references to the organisations listed in Schedule 1 and “the wider public sector”, do not take the net wide enough to cover some of the data that is relevant to our daily lives but is delivered on behalf of some authorities by third parties.  For example I am aware that recently a large city was not able to inform citizens of their rubbish collection schedules because that data was considered as commercially restricted by their service provider.

 

So in summary, I welcome the commitment to a right to data being realised by streams of government information about what we do, how much of our money is spend doing it, and what the outcomes are.  However, I am sceptical as to how effective the measures in the current Protection of Freedoms Bill will be in delivering them.  Especially in the light of very recent comments made by the Prime Minister highlighting the "enemies of enterprise" in Whitehall and town halls across the country, attacking what he called the "mad" bureaucracy that holds back entrepreneurs.  Those enemies are just the people who might take the wording of this bill as ammunition in their cause.

mug Whilst being concerned about this topic, I have been wondering why few are commenting on it.  Are the majority just taking the press conference statements by David Cameron, and his fellow Ministers, as indications of a battle won, or am I missing something?  I promote Sir Tim Berners-Lee’s 5 Star Data as the steps towards a Web of Linked Data – if we don’t get the publishing of public sector data to at least 3 star standard (Available as machine-readable structured data – in non-proprietary format), many of the current ambitions may remain just that, ambitions.  That would be a massive missed opportunity. 

So are we getting a right to data? – or just some provisions to extend the Freedom of Information Act a bit further in the dataset direction?  I’m not sure.

Personal note: As you may tell from the above, I am no expert on the interpretation of parliamentary legislation, and I have left several unanswered questions hanging in this post.  Any help in clarifying my thinking, confirming or disproving my assumptions, or answering some of those questions, will be gratefully received in comments to this post or your own posted thoughts.

Talis Group completes the sale of its Library division to Capita Group plc

3 March 2011, Birmingham, UK

Talis Information Limited, the library division of Talis Group Ltd, has been acquired by the UK’s leading outsourcing firm, Capita Group plc. The transaction is valued at £18.5m with an additional £2.5m due, based on performance over the next 12 months. Talis Information Ltd has a range of around 100 academic and public library clients based in the UK and employs 42 staff, all of whom are based in Birmingham, UK.

Talis Group’s other portfolio companies including Talis Education Ltd, Talis Systems Ltd and Talis Inc are unaffected by the acquisition of Talis Information Ltd.  Talis Group’s other divisions provide a SaaS-based semantic web platform and related applications including Talis Aspire, a resource list management solution for higher education customers.

Linked Spending Data – How and Why Bother Pt3

linkedlocalgovAs often is the way, events have conspired to prevent me from producing this third and final part in this How & Why of Local Government Spending Data as soon as I wanted.  So my apologies to those eagerly awaiting this latest.

To quickly recap, in Part 1 I addressed issues around why pick on spending data as a start point for Linked Data in Local Government, and indeed why go for Linked Data at all.  In Part 2, I used some of the excellent work that Stuart Harrison at Lichfield District Council has done in this area, as examples to demonstrate how you can publish spending data as Linked Data, for both human and programmatic consumption.

I am presuming that you are still with me on my basic assumptions “…publishing this [local government spending] data is a good thing” and “Publishing Local Authority data, such as local spending data, as ‘Linked Data’ is also a good thing”, plus the technique of using URIs to name things in a globally unique way (that also provides a link to more information) is not providing you with mental indigestion.  So, I now want to move on to some of the issues that are causing debate in the community which come under the headings of ontologies  identifiers.

Ontologies

An ontology, according to Wikipeda, is a formal representation of knowledge as a set of concepts within a domain  -  an ontology provides a shared vocabulary, which can be used to model a domain – that is, the type of objects and/or concepts that exist, and their properties and relations.  So in our quest to publish spending data what ontology should we use?  The Payments Ontology, with the accompanying guide to it’s application, is what is needed.  Using it, it becomes possible to describe individual payments, or expenditure lines, and their relationship between the authority (payment:payer) the supplier (payment:payee) category (payment:expenditureCategory) etc.  The next question is how do you identify the things that you are relating together using this ontology.

Lets take this one step at a time:

  1. Give the expenditure line, or individual payment, an identifier possibly generated by our accounts system. eg. 8605670.
  2. Make that identifier unique to our local authority by prefixing it with our internet domain name. eg. http://spending.lichfielddc.gov.uk/spend/8605670 – note the prefix of ‘http://’.  This enables anyone wanting detail about this item to follow the link to our site to get the information.
  3. Associate a payer with the payment with an RDF statement (or triple) using the Payments Ontology:
    http://spending.lichfielddc.gov.uk/spend/8605670 
    payment:payer
    http://statistics.data.gov.uk/id/local-authority/41UD .

    Note I am using an identifier for the payer that is published by statistics.data.gov.uk.  That is so that everyone else will unambiguously understand which authority is the one responsible for the payment.

  4. Follow the same approach for associating the payee http://spending.lichfielddc.gov.uk/spend/8605670 
    payment:payee
    http://spending.lichfielddc.gov.uk/supplier/bristow-sutor .
  5. And then repeat the process for categorisation, payment value etc.

This immediately throws up a couple of questions, such as why use a locally defined identifier for the payee – surely there is an identifier I can use that other will recognise, such as company or VAT number!  – there are, but as of the moment there are no established sets of URI identifiers for these.  OpenCorporates.com are doing some excellent work in this area, but Companies House, the logical choice for publishing such identifiers, have yet to do so.  Pragmatically it is probably a good idea to have a local identifier anyway and then associate it with another publicly recognised identifier:
http://spending.lichfielddc.gov.uk/supplier/bristow-sutor
owl:sameAs
http://opencorporates.com/companies/uk/01431688 .

Identifiers

A_Colorful_Cartoon_Chicken_Laying_a_Golden_Egg_Royalty_Free_Clipart_Picture_100705-004451-507053 Because this is all very new and still emerging, we now find ourselves in a bit of a chicken-or-egg situation.   I presume that most authorities have not built a mini spending website, like Lichfield District Council has, to serve up details when someone follows a link like this: http://spending.lichfielddc.gov.uk/spend/8605670 

You could still use such an identifier using your authority domain, and plan to back it up later with a web service to provide more information later.  Or you could let someone else, who takes a copy of your raw data, do it for you as OpenlyLocal might: http://openlylocal.com/financial_transactions/135/2010/33854 or maybe how the project we are working on with LGID might: http://id.spending.esd.org.uk/Payment/36UF/ds00024616.  If the open flexible world of Linked Data it doesn’t matter too much which domain an identifier is published from, or for that matter how many [related] identifiers are used for the same thing.

It does matter however, for those looking to the identifying URI for some idea of authority.  As I say above, technically it doesn’t matter who’s domain the identifier comes from, but I believe it would be better overall if it came from the authority who’s payment it is identifying.  Which puts us back in the chicken-or-egg situation as to resolving the URI to serve up more information.   The joy of Linked Data is that, provided aggregators consider the possibility of being able to identify source authorities data accurately when they encode it, it should be possible to automatically retrofit  links between URIs at a later date.

In summary over this series of posts we are seeing a technology which, although it has obvious benefits, is still early on the development curve; being applied to a process which is also new and scary for many.  An ideal breading ground for cries of pain, assertions of ‘it doesn’t work’ or ‘not worth bothering’, yet with the potential to provide a powerful foundation for a future open, accessible, and beneficial to authorities, government, citizens, and UK Plc data rich environment.  Yes it is worth bothering, just don’t expect benefits on day, or even month, one.

 

 

 

Linked Data: evolving the Web into a Global Data Space

As Linked Data becomes more established, a new book has been published that captures the state of the art and current best practices in the field. Authored by Dr Tom Heath, lead researcher at Talis, and Professor Christian Bizer‌ of the Freie Universität Berlin, “Linked Data: Evolving the Web into a Global Data Space” introduces the basic principles and rationale of Linked Data and provides detailed guidance for those exploring this emerging area of Web technology.

Abstract:

The World Wide Web has enabled the creation of a global information space comprising linked documents. As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to raw data not currently available on the Web or bound up in hypertext documents. Linked Data provides a publishing paradigm in which not only documents, but also data, can be a first class citizen of the Web, thereby enabling the extension of the Web with a global data space based on open standards – the Web of Data. In this Synthesis lecture we provide readers with a detailed technical introduction to Linked Data. We begin by outlining the basic principles of Linked Data, including coverage of relevant aspects of Web architecture. The remainder of the text is based around two main themes – the publication and consumption of Linked Data. Drawing on a practical Linked Data scenario, we provide guidance and best practices on: architectural approaches to publishing Linked Data; choosing URIs and vocabularies to identify and describe resources; deciding what data to return in a description of a resource on the Web; methods and frameworks for automated linking of data sets; and testing and debugging approaches for Linked Data deployments. We give an overview of existing Linked Data applications and then examine the architectures that are used to consume Linked Data from the Web, alongside existing tools and frameworks that enable these. Readers can expect to gain a rich technical understanding of Linked Data fundamentals, as the basis for application development, research or further study.

You can access a copy of Linked Data: Evolving the Web into a Global Data Space here.

Marketing the Semantic Web – Semantic Link podcast – Episode 3

The Semantic Link podcast panel are back with their third instalment of the podcast series. This month, two special guests: Krista Thomas, VP Marketing, Ad.ly (formerly VP Marketing, OpenCalais, Thomson Reuters) and Scott Brinker, President and CTO at Ion Interactive, Inc. join the panel to discuss the complexities around marketing the semantic web.

As you would expect, when marketing an intricate topic like the semantic web we are met with challenges. The panel discusses:  who are marketing to? To those who will utilise the technology? Independent developers? Consumers? Or the social community? The use of terminology in the semantic web world is also explored amongst other key issues.

You can listen to this month’s podcast here and catch up on previous conversations too.

David Wood talks with Talis

A short while ago, my colleague Zach Beauvais podcasted with the Vice President of Engineering at Talis Inc., David Wood. In this conversation, David discusses his background, Linked Data and SPARQL. He also talks about Talis Inc.s’ first US customer: the US Government Printing Office (GPO) and its Persistent URL infrastructure, which provides persistent Web addresses for critical government documents and is primarily used by the more than 1,200 Federal Depository Libraries. The PURL server uses the PURLz open source software, the development of which was led by David while at Zepheira, and complements the data hosting and search capabilities of the Talis Platform with identifier management functionality.

For more information, you can follow David on Twitter on read his blog.