Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

JISC calls for Linked Data projects… Talis can help

JISC calls for Linked Data projects…

Back in December, I met up with the Semantic Technologies Working Group at JISC to talk a bit about the rise of Linked Data and have a high-level look at who’s been doing what. It was a great talk, from my perspective, because I was speaking to a room full of folk who knew EXACTLY what I meant. Instead of stumbling over explaining basic principles, we were all able to have a pretty healthy discussion about the big picture—looking at companies and organisations who’ve told their stories in Nodalities Magazine, for example. I left with the impression that JISC certainly has its eye on the Linked Data ball, as it were.

My impression has been strengthened recently, as I read their commissioned Linked Data “horizon scan” paper published by Paul Miller over on Cloud of Data. The Horizon Scan makes several recommendations for further investigation into Linked Data for Higher Education, the gist of which is to keep their eyes out for good use cases and to engage the Linked Data community where it can to learn more.

Then, as a couple hundred folk gathered at the second Linked Data Meetup in London, JISC announced that it’s putting £750,000 up: “…for projects to make content available on the Web working using linked data approaches.” JISC is calling for Higher Education projects to build Linked Data!

Talis can help

So, it looks like there is some alignment with our purpose here at Talis, then. We often talk about building the web of Linked Data, and we’ve been pushing projects and building stuff to make that happen. Now, it’s your turn…

The deadline for receipt of proposals in response to this call is 12 noon UK time on Tuesday 20 April 2010.

So, there isn’t much time to get proposals in. One way we can help is to host any Linked Data needed for a project on the Platform through the Connected Commons initiative. As we’ve reviewed here before, Talis will host any public data as Linked Data in the Platform. By public, we simply mean rights-waived (using PDDL or CC0) so it can be reused. The Platform hosts data online, and will also give you a SPARQL endpoint and RESTful API for rapid development on top of your new Linked Data.

The other thing we can do is to provide free developer licenses for working with the Platform and the API. There is also an extensive archive of documentation over on the developers’ wiki. Let us know what you’d like to build, and we’ll see if we can help. We’re keen to see more projects surfacing Linked Data, and it’s exciting to see what you will be building!

Finally, I’d love to hear about your projects. I can tell your Linked Data story in Nodalities Magazine, or perhaps as a podcast—whether you’re using the Platform or not. It’s great to share success stories with the wider community, and this should provide many good stories!

JISC funds Higher Education projects in the UK, and their full eligibility criteria are up on their call post.

Richard Stirling Talks about data.gov.uk

Richard Stirling Sporting the title of Head of Making Public Data Public and data.gov.uk, Richard Stirling leads the central team behind those two initiatives, based out of the Cabinet Office of the UK Government.

In our conversation we discuss data.gov.uk which emerged from a conversation between  Sir Tim Berners-Lee and the Prime Minister, less than a year ago, and how Richard found himself involved.

The setting up and launch of data.gov.uk, with external advisors Sir Tim Berners-Lee & Nigel Shadbolt, exceedingly short development period, and involvement of the wider community, has been very different to the popular conception of a government IT project.  Richard gives us some insight as to what it has been like on the inside.  He also explains the focus on taking data in to RDF and Linked Data in addition to publishing it in the form provided by the originating departments.

With a look in to what might be next, Richard gives us a great view of what is behind the web site.

Picture published on Flickr by Thayer18

 
 Richard Stirling Talks with Talis [37:36m]: Play Now | Play in Popup | Download (287)

Linked Data Visualisation Launched at Prime Minister’s Conference

BIS_scrn1

To quote Prime Minister Gordon Brown in his opening speech today at the Global Investment Conference 2010 “from today you will be identify centres of excellence at the click of a button”.  Obviously in a general global stage speech, a Prime Minister cannot go in to detail, but he was referring to a project  delivered in super quick time to the UK Government which is launched today – The Research Funding Explorer.

It was less than a month ago when the Department for Business Innovation and Skills (BIS) asked us at Talis if we could use the Linked Data Principles and practice demonstrated in our work with data.gov.uk to produce an application to help them with their mission.  Specifically they wanted a way to demonstrate to those looking to invest in the UK, where the centres of excellence are located.

You can’t beat the focus of a fixed delivery date to stimulate innovation.  So when we were asked to not only come up with the the pilot for a real application but also have it ready in two weeks, in time for the preparation of the Prime Minister’s Conference, the team behind it were filled with challenge and trepidation in equal measure – especially as at this time we hadn’t had a close look at the data.

They wanted something that could join the list of applications on the data.gov.uk Apps List and show how Linked Data from several sources could be brought together to deliver real benefit in a way that each source alone could not.   The data originated from organisations such as the the Technology Strategy Board, the Medical Research Council, the Engineering and Physical Sciences Research Council, and the Intellectual Property Office mostly in the form of large spreadsheets.  The data was extracted from these, transformed in to RDF and loaded in to the Talis Platform utilising URIs for concept which will be compatible with rest of the RDF to be found on data.gov.uk.  With great input from visualisation developers at  Iconomical, the Research Funding Explorer was born.

BIS_scrn2In the limited time available it was not possible to ingest and display data for all research topics, so for that demonstrate the UK’s investment in leading technologies were chosen: RFID, Advanced Composites, Regenerative Medicine, and Plastic Electronics.  Running the animation on the home page of the site clearly shows the funding hot spots for these topics of UK research.   Zooming in to the map shows the location of the organisations involved. The graph on the visualisation tracking the national cumulative investment in these subjects, overlaid with an indication of the number of patents granted for each.

The obvious wow of this application is the visualisation, but the real power of storing this data in RDF, and using SPARQL to query it,  becomes apparent when you start navigating it via the subjects, regions, and organisations, seamlessly following the associations between them.  For a quick whiz through what the application is capable of, checkout this short screencast:

 
 Research Funding Explorer demo: Play Now | Play in Popup | Download (185)

At the moment the data is all stored within a single Talis Platform store (if you are at home with SPARQL, check it out here), over the next couple of weeks this data will be made available via stores available via data.gov.uk so that it can be used to drive other innovative applications.

This is only a start, but already this project has demonstrated that publishing data as Linked Data in a queryable store can stimulate innovation beyond the ubiquitous demo mashup towards real full-blown applications that can deliver commercial benefit.

Open… and Mobile?

light trailsI know what you’re thinking: “He’s going to say Data!”

Well, I might do at some point, but I was going to say “Days”. Last month, Talis flung open its doors to 30 or so folk who were interested in SPARQL, the Semantic Web and Linked … er, Data. The idea was to host an informal event for folks learn about much of what we’ve been talking about for the past few years. We planned some talks on what it means to join up your data, what this Platform is about, and a detailed introduction to SPARQL. With the launch of data.gov.uk and many of the stories covered over in the Magazine, it seemed possible that people were starting to get interested in this whole Linked Data scene.

So, we sent out some invites and tweeted a bit, and soon had to cap the registration numbers. We filled up spaces in the January day not long after New Year, and the February day not long after the January one. March is quickly filling up too (hint). I have to admit, I wasn’t expecting this many people to express an interest so soon. Not only did people sign up, but travelled to Birmingham through adverse weather to come and take part at both ‘Days—and we’ve had a lot of fun.

One thing that seemed to be a good idea was to ask for feedback before the event. It sounds wrong, but the point of an Open Day is to cover things that YOU’re interested in learning or exploring. So, when people registered, they were asked for their expectations and what they’d like to take away with them from such an event—aside from a T-shirt and SPARQL mug, obviously. It made it much easier to work out what we should cover, and I hope it meant that we were able to talk about the things most relevant to the people who came along.

I’d like to do it again, but slightly differently. Instead of hosting an Open Day here at Talis HQ, what if we came to you? Would you be interested in attending a Talis Platform Roadshow? What would you want us to cover? More importantly, where would you like us to go?

Comments below, or email me or tweet me.

Sharing Data on the Web

| This article will appear in Nodalities Magazine, Issue 9.

by Kaitlin Thaney
Program Manager of Science Commons, Creative Commons

Photo 32

In the emerging data web, there have been multiple efforts working towards the same broad goal of data sharing (ie., the NeuroCommons, Linked Open Data, efforts of the World Wide Web Consortium), but are still unevenly distributed. Our understanding of the legal, social and technical issues is increasing, but still is at a very early stage.

This past fall at the International Semantic Web Conference in Chantilly, VA, USA, I joined three other leading minds to lead a tutorial examining some of the legal and social frameworks for sharing data in the emerging data web, focusing on an overview of the need for access, the social issues of applying Free-Libre/Open Source (FLOSS) licenses to data, and the approach we advocate at Creative Commons to help navigate this complex space — converging on the public domain.

Lessons Learned

Creative Commons as an organisation works to make knowledge sharing easy, legal and scalable – with applications in the culture space (music, text, film, art), education (open educational resources, virtual textbooks), and science (biological materials transfer, data sharing, Open Access, semantic web, patents). We maintain an integrated approach, and craft policy and legal tools to lower the barriers to knowledge sharing.

When it comes to data sharing, first and foremost, the information needs to be legally and technically accessible. The Open Access movement has increased awareness to this, using the Creative Commons licensing suite to unlock content, and has seen its share of qualified success. But what to do when the information you want to share and reuse falls outside the protections of copyright?

In short, it’s complicated.

This is the where the discussion of legal protections for data gets murky. Knowledge is not always copyrightable – it may be easy to discern the rights associated with journal articles, but what about data, ontologies, annotations, or research statements described in triples?

The emergence, adoption, and use of the free-libre/open licensing regimes has allowed for remix and reuse of software code, music, film, educational resources and scientific research in a way that otherwise would be difficult to achieve.

The successes of these licensing approaches has caused a change in the social ethos of licensing, instead using a traditional “all rights reserved” model to make something more free, rather than less.

But from our research, this approach is not ideal for data. The trend towards applying licenses, click-wrap agreements and other sorts of restrictions on scientific data is increasing, but with the undesired consequence of limiting the downstream use of this information, and even at times blocking interoperability. The costs are high, the terms are not always clear, nor the protections always legally sound, making it very difficult to scale for scientific uses. The result is a high barrier to entry to do meaningful analysis, annotation, search, etc. on the mass of data available currently that’s continuing to grow exponentially, and integrating with the literature available.

We advocate an approach of converging on the public domain, and requesting behaviours often found in the various flavours of free and open licensing through norms – not a legal construct. But first, let’s take a look at some of the issues to be aware of and their social implications to furthering the goal of linked open data.

Attribution v. Citation

Under US Copyright law, “Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.”Since facts are not covered by copyright, attribution – a license obligation – doesn’t seem to apply to ideas or facts either, since those rights are conditional on compliance with terms of the license.

Socially, the scholarly concept of citation is fairly well understood – credit where credit us due. It has long been viewed as an entrenched norm of good scientific practice.

But when it comes to the legalities of both terms and how to enact this behaviour, the devil is in the details, and the two are actually rather different when it comes to enforceability and applications / ramifications in the digital world.

In a copyright license, the word “attribution” is a legal requirement, whereas citation evokes more of a club mentality and social practice. Citation in its sole form is not assured or enforceable in the same way, but that’s not necessarily a downside. Ask yourself this, which one is more important – legal enforcement or credit enforced through professional reputation? Attribution – a relatively narrow legal term that can affect interoperability while at the same time possibly failing to provide what you really want? Or citation – an entrenched scientific norm that asks for credit where credit is due.

Implications of FLOSS toggles and directives on data sharing

These issues emerge when instead of focusing on maximizing interoperability of resources, one applies a property metaphor to data. And in the digital world, that tendency can have quite limiting ramifications to future use of the information, as technology continues to outpace the social components to data sharing.

Misunderstanding the legalities can lead to category errors on the social level, including unintentional infringement or on the other side of the spectrum, choosing not to use the resource for fear of infringement. The intentions are often good – believing that applying a less-restrictive copyright license is ensuring the data can be freely shared, reused, and built upon. But without existing precedent or involving a legal team, these issues make for a problematic area to navigate, creating additional confusion and burdens for the users, as well as data providers.

Let’s look at a few examples to gain a better understanding.

Non-Commercial – When used in the context of data, what is a commercial use of the data web? Is it the extraction of a subset, a query that may touch on the data set, hyperlinking?

Attribution – As detailed above, the definitions of attribution and citation are often conflated. Attribution speaks to the legal requirement triggered by the use of the work. But in the case of linked open data, if one were to run a query involving 30,000 data sources (something that is happening every day at an ever decreasing cost), would they then be required to attribute the contributors for all 30,000 databases? You can see how this unintended consequence of attribution stacking could impose a very daunting task for the researcher.

Share-Alike – This toggle specifies that any derivative product be relicensed under the same terms. In the example above of running a large query, all it would take would be one database licensed with a share-alike provision for the entire derivate work to then be under the same terms and no other license. This leads to compatibility issues

There are other external mechanisms and limitations imposed by various jurisdictions and countries that can have a profound effect on data-sharing, especially in terms of international data sharing efforts. These include the sui generis database directive in the European Union, Crown Copyright, “sweat of the brow” and “industrious collection” limitations, trade secrets and unfair competition laws, adding another dimension of complexity to an already complex arena.

After convening a series of meetings, roundtables and other discussions with members of the scientific community, the need emerged for a legally accurate and simple solution, that reduced and/or eliminated the need for one to make the distinction of what’s protected. The conflict between understanding the legal issues and complexities can best be resolved by a two-fold approach: (1) a reconstruction of the public domain and (2) the use of scientific norms to request behaviour through a non-license means.

Converging on the Public Domain (+ Norms)

We believe that the public domain is the best means to achieve maximum interoperability of data with the lowest imposed burdens on the user. This can be achieved through the use of a legal tool – either the Creative Commons CC0 Waiver or the Public Domain Dedication and License (PDDL) – waiving all intellectual property rights asserting that the provider makes no claims on the data. These tools put the work as closely into the public domain as possible.

It calls for data providers to waive all rights necessary for data extraction and re-use (ie., copyright, sui generis database rights, claims of unfair competition, implied contracts). It also requires the provider place no additional obligations such as copyleft or share-alike on the information, which could limit downstream use, as discussed above.

Science Commons also crafted the Protocol for Implementing Open Access Data – a protocol for evaluating database terms of use, in hopes of providing a unified framework for users to evaluate if any given database may be integrated with any other database.

The Protocol recommends one request behaviour, such as citation, through norms and terms of use rather than as a legal requirement based on copyright or contracts.

We are aware that different disciplines and jurisdictions call for different approaches, and this is not always a one-size-fits-all solution. With requesting behaviour through norms and terms of use rather than a legal construct, various scientific disciplines have the ability to develop their own norms for citation, allowing for legal certainty without constraining one community to the norms of another.

Final Thoughts

In the early days of the World Wide Web, there weren’t many free-libre licenses available, and after a debate over using GPL for the original web code, CERN chose to put it into the public domain. Getting the law out of the way was key to allow for network effects, and to the success of the Web.

Converge on the public domain and ensure the freedom to integrate. It’s the most scalable solution.

This work is licensed under a Creative Commons Attribution 3.0 License.

Resources

Martin Belam Talks with Talis

Martin BelamIn this Nodalities Podcast, I talk with blogger and Guardian information architect Martin Belam. I’ve run into Martin at a few Linked Data events where the news and media industries have had a high profile (including the recent News Media Summit, and News Innovation conference last year). Martin has an interest in Linked Data, and an interesting perspective on where it fits in with News, both as a tool for journalism and research and as a resource for the industry.

Also mentioned:
Guardian Open Platform

 
 Martin Belam Talks with Talis [25:36m]: Play Now | Play in Popup | Download (293)

We’re excited

Yay!The Talis offices, for the past few weeks, have been awash with geeky excitement—that kind of near giddy excitement that comes with eager expectation. We’ve all been waiting for something important.

For some, this was no doubt augmented with the announcement of Steve’s new iPad; but that’s not what’s gotten us all worked up.

For months, we’ve been looking forward to the launch of data.gov.uk; and last week, the wraps finally came off. The official press release put it:

A major new website has been launched to the public which gives anyone who wants to use it unprecedented and free access to government data in one place.

This doesn’t quite capture the coolness of the launch, for me. Yes, it’s a major new website, and it’s point is to publish information. But, the exciting thing is that this information is being published as data: data that can be used, reused, remixed and enriched. Sir Tim Berners-Lee’s perspective was more exciting:

Making public data available for re-use is about increasing accountability and transparency and letting people create new, innovative ways of using it. Government data should be a public resource. By releasing it, we can unlock new ideas for delivering public services, help communities and society work better, and let talented entrepreneurs and engineers create new businesses and services.

The point is that this public resource is finally getting a home on the web, and an infrastructure to make it not just available, but useful.

The exceptional team behind data.gov.uk have striven to adhere to web standards in its production: including Linked Data as a priority, as Professor Nigel Shadbolt explained:

We are also going to increase the use of ‘Linked Data’ standards, which allows people to provide data in a way that is as flexible and easy-to-use as possible.

Back in November, Leigh Dodds wrote a post explaining how we’ve been involved, and there’s an official Talis Platform press release too. Basically, we’ve been working with the data.gov.uk team to help with the Linked Data part of the site—hosting the SPARQL endpoints and providing consultancy and training, for example.

I can confidently say that we’re very proud of data.gov.uk, the team behind it, and our involvement with it. We’re excited by the prospect of this data being used as raw material for clever people to make interesting, useful, even world-changing things with it. We’ve seen the beginnings and proof-of-concept projects already.

Now comes the really exciting stuff. What are you going to build?

Image: “Yay for happy days!” by le vent le cri via flickr (CC: By)

In conversation with Conrad Wolfram

Conrad_Wolfram The subject of this Talking with Talis Nodalities Podcast is Conrad Wolfram, founder and Managing Director of Wolfram Research Europe.  He is also Strategic and International Director for Wolfram Research, the organisation founded by his brother Stephen, and responsible for Mathematica software and the WolframAlpha Knowledge Engine.

In our wide ranging conversation we look at Conrad’s career, the evolution of Wolfram Research and its role in introducing wider access to computational functionality.   He takes us through the creation of Mathematica by Stephen Conrad and building a company based upon maths.

We move on to discus the WolframAlpha Knowledge Engine, which is built upon Mathematica technology, and how it fits both in to the online world and the Wolfram strategy.  We close having discussed many issues relevant to the evolution and future of the web.

Photo Copyright © 2009, Conrad Wolfram.

 
 Conrad Wolfram talks with Talis: Play Now | Play in Popup | Download (358)

Philip (Flip) Kromer talks about InfoChimps and building a data marketplace

In my latest podcast I talk with Flip Kromer, co-founder of InfoChimps.

We explore the background to InfoChimps, and discuss their aspiration to build a marketplace in which people can contribute and find data – both freely available and commercial.

 
 Standard Podcast [47:09m]: Play Now | Play in Popup | Download (478)

Felix Van de Maele talks about Collibra

In my latest podcast I talk with Felix Van de Maele, CEO of Belgian semantic technology company Collibra.

We discuss Collibra, and the problems that many enterprises face in understanding and integrating data held in diverse silos.

 
 Standard Podcast [23:40m]: Play Now | Play in Popup | Download (398)