Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Archive for the 'Uncategorized' Category

Linked Data and the Public Domain

We love data at Talis and we want as much of it to be freely reusable as possible. In fact, because we wanted to see even more reusable data we recently launched the Talis Connected Commons offering completely free hosting of public domain data. We believe that dedicating data to the public domain is the best way to ensure that data is universally reusable and remixable. When data is public domain it means that it can be reused automatically without needing to check terms and conditions or track the source of every statement to provide attribution. These kinds of things act as friction to reuse, wasting energy that could be better spent creating inspiring things.

We also firmly believe that, in the future, there will a significant role for other forms of data licensing, including commercial access. We will support those efforts too when the time comes but today the Linked Data web needs more and better data that is freely accessible.

Licensing vs Waivers

You are probably familiar with the process of licensing a creative work, most likely through the great job that Creative Commons have been doing in recent years. However, the concept of waivers is less well known but highly relevant to reuse of linked data.

Whenever you create something you have automatic rights over it granted to you. The best known of these rights is copyright, which gives you the exclusive right to make copies of your creative work. There are many other rights which can be held over intellectual property such as design rights, trade marks, registered designs, performers rights, trade secrets, database rights, publication rights and many more.

Licensing is the process of granting others limited use of rights you possess. For example, when you license your copyright you are granting specific people a limited right to make copies without having to ask you first. Licensing of one right does not affect your possession of the others. For example you could grant the right to copy your work but retain the right to perform it. Creative Commons licenses are mostly concerned with copyright, but they do not usually deal with the other rights such as database rights or trade secrets.

Waivers, on the other hand, are a voluntary relinquishment of a right. If you waive your exclusive copyright over a work then you are explictly allowing other people to copy it and you will have no claim over their use of it in that way. It gives users of your work huge freedom and confidence that they will not be persued for license fees in the future.

The Licensing Problem

In general factual data does not convey any copyrights, but it may be subject to other rights such as trade mark or, in many jurisdictions, database right. Because factual data is not usually subject to copyright, the standard Creative Commons licenses are not applicable: you can’t grant the exclusive right to copy the facts if that right isn’t yours to give. It also means you cannot add conditions such as share-alike.

There isn’t a Creative Commons license for every possible right and there probably can’t be because of the huge variation in rights granted in different jurisdictions around the world. Also, when we start to look at licensing compilations of data we find that the situation becomes complex because you have to consider both the database and its contents seperately. For example a document of articles would be subject to database right over the whole collection and individual copyrights for each article, quite possible to many different owners. The Open Data Commons has addressed this particular example with its Open Database License and Database Contents License (based on work originally donated by Talis). If a standard license doesn’t exist then you need to hire lawyers and write one for yourself – a potentially huge cost.

Our collective goal for a successful Linked Data web has to be to protect consumers of the data: the people who are remixing many different sources of data. Our intentions may be very honourable, but people need certainty if they are to build enduring value on data. Creative Commons licenses are irrevocable so even if you lose control over your work through some misfortune, the people reusing it will be protected forever. Imagine this scenario: you allow people to use data you have collated but your company goes bankrupt and the rights to the data collection are sold by the liquidators. If you hadn’t licensed your rights explicitly then every one of your users could be liable to be sued by the new rights holder!

This is where waivers of rights can help. By explictly waiving your rights over your data then you are giving your users the best guarantee of safety that you can. Even if you lost control of the data collection subsequent owners could not persue your users because the rights you held have already been waived.

There are two waivers of rights that can be applied to datasets:

Both of these waivers can be used for data intended for submission to the Talis Connected Commons.

Community Norms

When you apply a waiver like CC0 you are relinquishing all your rights over the work to the fullest extent possible under the law. That means that you cannot force people to attribute you or stop them from making commercial use of your work.

The preferred approach is to attach a set of community norms to the work. These are like a code of conduct for use of the work and are usually self-policing. They are not legally enforceable but form part of the ethical or professional requirements for participating in a community. The best known example of community norms are the citation standards used in the academic commnity. Citing pre-existing work is not legally enforceable but those who abuse the norms can find themselves excluded from the academic community.

The Open Data Commons has published a set of attribution and share-alike norms which asks that users of the data:

  • Share work derived from the data.
  • Give credit to the original data publisher.
  • Point others at the source of the data.
  • Publish in open formats.
  • Avoid using digital rights management.

How to Declare Your Waiver

To delare your waiver in a machine readable way, you should first create a voID description of your dataset. VoID, or Vocabulary of Interlinked Datasets, is a vocabulary designed to describe key attributes of your dataset. We created a waiver RDF vocabulary that can be used with voID to declare any waiver of rights and the community norms around a dataset.

In this example we describe a dataset using the void:Dataset class and provide it with a dc:title as a minimal human readable description. You should add other descriptive properties as necessary (some suggestions can be found in the voID guide).

We then use the wv:waiver property (defined in the waiver RDF vocabulary) to link the dataset to the Open Data Commons PDDL waiver. We use the wv:declaration property to include a human-readable declaration of the waiver. This is purely informational, but can be immediately be used by a person examining the voID description. Finally we use the wv:norms property to link the dataset to the community norms we suggest for it, in this case the ODC Attribution and Share-alike norms.

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/terms/"
  xmlns:wv="http://vocab.org/waiver/terms/"
  xmlns:void="http://rdfs.org/ns/void#">
  <void:Dataset rdf:about="{{uri of your dataset}}">
    <dc:title>{{name of dataset}}</dc:title>
    <wv:waiver rdf:resource="http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/"/>
    <wv:norms rdf:resource="http://www.opendatacommons.org/norms/odc-by-sa/" />
    <wv:declaration>
      To the extent possible under law, {{your name or organisation}} has waived all
      copyright and related or neighboring rights to {{name of dataset}}
    </wv:declaration>
  </void:Dataset>
</rdf:RDF>

Alternatively if you were to choose the CC0 waiver without any particular norms then you should use the following RDF:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/terms/"
  xmlns:wv="http://vocab.org/waiver/terms/"
  xmlns:void="http://rdfs.org/ns/void#">
  <void:Dataset rdf:about="{{uri of your dataset}}">
    <dc:title>{{name of dataset}}</dc:title>
    <wv:waiver rdf:resource="http://creativecommons.org/publicdomain/zero/1.0/"/>
    <wv:declaration>
      To the extent possible under law, {{your name or organisation}} has waived all
      copyright and related or neighboring rights to {{name of dataset}}
    </wv:declaration>
  </void:Dataset>
</rdf:RDF>

These examples show that it is very simple to declare your waiver. However, before you do so be sure to read carefully what rights you are irrevocably giving up. For example you would most likely be waiving your publicity and privacy rights, so if your image is included in the dataset you could not later complain that someone is using it in a way you do not approve of. If you are worried about how your work will be used, if you want to legally require attribution, or if you don’t want people to make money off of your work, then you should not use a waiver and instead seek legal advice on the creation of a data license specific to your needs.

Down Tools…

Update: all maintenance has been successfully accomplished, and the blogs should all be up and available again. Thanks to the Live Services team for some handy/fast work! The Nodalities blog will be unavailable from around 8pm (GMT) this evening for some scheduled maintenance. This will mean the posts, pages and RSS/Atom feeds will all be unaccessible till around 8am tomorrow morning.

We hope this won’t be too inconvenient for anyone, and that you enjoy the break ;)

Image: “Rex – Gone Fishing” by snuzzy via flickr Creative Commons, “By 2.0″

John Wilbanks talks about Open Data and Science Commons

A podcast just published on Talis’ Xiphos blog may also be of interest to readers of Nodalities. In it, I talk with John Wilbanks of Science Commons to discuss his views on Open Data and Linked Data.

Enjoy.

A Good Week for RDFa

With the unveiling of Google’s RDFa support and discussions from the UK’s Central Office of Information around using RDFa in their job sites, there has certainly been a lot of coverage of RDFa and Linked Data over the past few days.

Google’s announcement feels a bit limp, hidden as it is in the webmasters’ tools. To read their own description of “Rich Snippets,” you’d think they were little more than an additional piece in the armory of SEO’s and content editors, giving them the ability to flag reviews and products on their pages. The real excitement, as Tim O’Reilly mentioned, is that this is Google’s first active support for explicit information. A site can now state: “We give this widget 4 stars out of 5, it costs £100, and our CEO is Joe Bloggs.” That’s fantastic!

I wonder if we tend to miss the importance of explicit statements, because we default to googleing for something and hoping the first page or so of results will contain the answer. I can very swiftly find “reviews” for a Logitech Mouse, for example; but I still have to go through the reviews and find what they said. I might be lucky if Google shows me the result within the site description, but I’m much more likely to need to follow my own lead after Google serves me up a bunch of links to follow. This lets sites explicitely surface an (admittedly currently woefully limited) amount of their own data. It makes much more sense for finding what you’re actually after without needing to disambiguate yourself. It feels like a step in the right direction. It leaves me personally wishing Google would open it right out and support full vocabularies, but I’m glad for this initial offering.

Alongside Google, the Central Office of Information seems to be taking a much more webby approach to Linked Data, by supporting FoaF and other public vocabularies. Mark Birbeck explains:

To facilitate this we set up an open source project called argot-hub, with a wiki, issue-tracking system and associated discussion lists.

The first vocabularies — or argots — that I defined were for job vacancies, but in order to make the terminology usable in other situations, I broke out argots for replying to the vacancy, the specification of contact details, location information, and so on.

An argot doesn’t necessarily involve the creation of new terms, and in fact most of the argots use terms from Dublin Core, FOAF and vCard. So although new terms have been created if they are needed, the main idea behind an argot is to collect together terms from various vocabularies that suit a particular purpose.

The first pages to support the RDFa information will be vacancy notices, which can be seen at the Civil Service home page. The great thing about this is that it’s supporting application information retrieval. An application can query the site, pull out explicit information, and voila: You’re very own “what jobs are available in the Civil Service” app. Looking at all the info there, you could have a field day, sorting by salary, area of interest or whatever.

So, two very different use cases for the Semantic Web via RDFa. What’s next?

Connected Commons Coverage

Following our recent announcement of the Talis Connected Commons, there has been some significant coverage in the blogosphere. I’ll list a few of the links I’ve seen here, and if you’ve come across others, or have covered the Connected Commons yourself, please share in the comments.

Read Write Web

Marshall Kirkpatrick captured some of the Linked Data future vision with his timely coverage over on Read Write Web. Marshall’s coverage sets the stage for potentials and takes a very much forward-looking view on the Connected Commons.

Science Commons

With Talis’ use of the Creative Commons CC0 License as one route for the Connected Commons data licensing, the Science Commons blog has made reference to the scheme. I quote: “We commend Talis for using CC0 as a means to clearly mark and identify public domain data, and look forward to see what fruit this tree will bring for the open data / linked data communities.”

Unilever Cambridge Centre for Molecular Informatics

“We’ll certainly be taking this up.”

bbgm by Deepak Singh

“We are living in a world where data sharing, data access, and open data in general are getting more and more important, and available.”

The Content Guy

“That last bit is what has Mike and me interested – finding new ways of making use of the relationships between data and content that all the various semantic tools unearth.”

Open Knowledge Foundation

Again, if you have something to say on your own blog, or have come across more coverage of Connected Commons, drop it in the comments.

State of the Semantic Web (Part 1)

Well finally I got around to starting this write-up, and the first instalment has appeared in the excellent IEEE Internet Computing. I foolishly thought I’d be able to cover the main ground in one column, now it seems like I’ll need at least three. In Delivered Deliverables I look mostly at the output of the W3C. The provisional plan is to cover infrastructure & backend tools in part two (with comments of the notion of linked data), and move on to real-world applications in part three. Suggestions are very much welcome.

What would you collate?

We’ve been talking a lot about the prevalence of data, and how interacting with it empowers people. I’ve also been looking at lots of beta and public web apps which are, by definition, connected to the rest of the world.

There are some obvious cases where my data is helpful (I know, as a linguist, I should say “data are”, but I’m seeing how this feels). I recently set up a home computer with users for both me and my wife. My iCal and Mail automatically sync using various systems like Plaxo, MobileMe and some well-earned IMAP settings; so within about 15 minutes of hearing the first ever “dun!” as it turned on, all my contacts, calendar events, reminders and emails had appeared more or less as they are on my laptop.

“Does it know my addresses, too?” my wife asked hopefully. Sadly, no. It’s taken a lot longer to get her data acquainted with the new computer. Not only that, but the computer it replaced is on a different platform. So, via an external hard-drive and lots of tweaking, I think I have all the files from the old one.

When you think about all the times we use connected software, it makes you wonder why on earth we have to keep doing this again and again. Alongside the obvious data, like contacts, calendar events, and personal settings; there is a world of nearly-immediately useful stuff. When and where I heard that song might not be life-changeing, but it certainly helps with earworms, right? So, was I listening to Last.FM, Blip.FM, Spotify, iPlayer, or—heaven forbid—the radio? Where was this photo taken? What happened in March to make my heating bill so high? When’s my next car service needed, and why did these particular tyres seem to wear out so badly? These are questions I’ve asked myself within 10hours of writing this.

A level further, is a host of semi-useful data just waiting to be connected and used. This guy collated everything into charts. Another has his house twitter whenever anything significant happens. We have all become aware of how important collected information can be to large organisations when it comes to margins, analytics, and strategy. Just look at your wallet and work out how many “loyalty” cards you have. But, when is this going to be useful to us, directly, without waiting for it to all get munged by a large body in order to get a few points-pence off your next purchase?

So, I guess what I’m asking is: “What would you collate?” What data would you most like to see as a next step toward usefullness? Perhaps the solution is more important, actually… what would you like to change? Your spending habbits, your health, your credit limit over time?

I’d be very interested to see where people want their data to work with/for them…

Smart stuff

I was listening to the Today Programme on Radio 4 the other morning, and heard that the vast majority of computers in the world are not part of PC’s. Most processors are thinking away in non-computer items like washing machines, cars, and mobile phones. OK, so this isn’t really that new, but the piece made some good points. I reckon if most people thought about it, they’d realise it’s not that surprising. Your watch has a little computer chip in it that you programme using an obscure ritual of holding down impossibly small buttons—if you don’t recognise this, it may be that you spend a lot on watches, and yours isn’t so much a processor as a series of cogs and springs doing the calculating. Your fridge “knows” when it’s too hot or cold, and your latest mobile is blatantly a little computer with a funny ring-tone.

Alongside this has been discussion, much of it stemming from Nokia (news), about pulling many of these computers together to do some more impressive processing for us. Nokia in particular wants to organise these computers by providing a platform on which people can interact from their mobile phones. So, you get a hugely customised data-set from your home and access to an open platform and so you can programmitacally access lots of these currently-isolated computing processes, right from your mobile handset. It’s pretty exciting, but I don’t understand the angles from much of the news coverage.

The Telegraph and the BBC coverage, for example, both tell this story as a really, super-cool system that lets you turn on your heating with your phone!

My reaction? “Whoah, no way! I can turn on my boiler without a match and that fear-for-your-life worry that I might have let too much gas out to get the pilot lit?”

It’s been dubbed “the stuff of science fiction,” but I can’t help wondering if the sci-fi stuff might be a bit more exciting than a remote-control for your boiler? If you think about home-heating for a moment, it’s not difficult to realise you already programmitacally interact with the system, and have for decades. The break-through came with the use of thermo-dynamic bi-metallic coils which expand one way when it’s warm and contract the other when it’s cooled down, tripping a switch. Adding this to a timer-switch, and you got a programme you can control for both time of day and ambient temperature. More and more sophisticated versions of this system have come out every year since the middle of the last century. When I was a teenager, my dad built our boiler system with about 10 different “zones” each with its own thermostat, and this was in a single-story ranch-style house in the middle of Colorado! Not even high-rise, exclusive, city technology really.

So, if the magic isn’t in the remote for the boiler, where is it?

Well, to me, the exciting stuff happens where you can connect all the different processes going on in your house, and getting… DATA! If these data are all accessible through the WWW, you have instant, personalised network-effect-enabled technology for your house. It’s Home 2.0 from here on out!

Why? Because you can suddenly interact with every feature of your personalised environment. Want to write a programme to help cut back on CO2? No problems, this app can let you choose how cold your food really needs to be, how hot your boiler needs to get, and which rooms don’t need to be warm at all times—and, it can tell you which appliances are using the most energy and compare them instantly with more cost and energy-efficient models. Have a higher-than-expected utility bill? Well, now the operator on the other end of the phone can get a real-time request from your own home metre telling them exactly what’s been used.

How about combining some of this with external datasets too? Why not have an anonymized “my street” view of CO2 emissions and maybe work out frost-traps in your community. Plot intruder-data from home security systems from an entire region? Get an accurate quotation from your geological location, current housing materials, and plumbing configurations as part of your plan to build an extension.

This is the stuff of science fiction (from the 1950′s, that is). Homes that can be programmed, data that can be used to make our habitat more efficient and comfortable. Saving money, saving emissions, etc.. All we need are the robot butlers to make us breakfast and press our twill trousers, and we’re set!

A data-centric view

Justin’s been talking about viewing the future from a data-centric perspective, rather than application or software-centric. I came across an interesting example of a data-centric approach over on flickr. Flickr hosts a huge number of images, and quite a bit of metadata alongside these. This metadata includes camera used, aperture, lighting information and dates among others.

Here, however, they’ve been taking a look at their geo-data. With increasing numbers of cameras and camera-phones capturing geographic location, flickr have been able to create some very interesting visualisations which illustrate the surfaced connections amongst this huge stockpot of data. By plotting shapes to encompasse these locations, and mashing them up with names at various levels (e.g. neighbourhood, city, province, country continent); they could begin chipping away regions which are not photographed. The resulting alpha shapes strongly resonate with named geographic locations.

in other words, flickr, without recourse to a map, have created visualisations of their data which represent named geographic locations.(http://flickr.com/photos/straup/2972131146/)

Check out their project page for more (and better-explained ;) ) details.

The point here, is that the flickr team did not wake up one morning and think: “You know, if we captured THIS kind of data, we could create this mashup; so let’s create an application.” Instead, they re-used data they were already capturing, and brought out something very interesting indeed.

By creating tools which match their data (and could be used with other data of the same kinds), flickr is able to expose layers of value from the rich-pickings of their own data-cloud.

The good stuff is where the data are.

How will we interact with all this data?

As the web and the Semantic Web continue to bring data together with increasing applications and published datasets, how are we going to continue to interact with this sea of information?

Talis’ Tom Heath has contributed a column in IEEE’s Internet Computing about the future of people and Semantic Web interaction.

Tom’s article covers  “…some ways in which our interaction with the Web of data might differ from how we interact with the established Web of documents, and what this might mean for both users and producers of Web content.” He emphasises that: “Far from removing humans from the equation, a Web of machine-readable data (the Semantic Web; we also call it the “Web of data”) creates significant challenges and opportunities for human-computer interaction.”

Talis is also helping to organise the Visual Interfaces to the Social and the Semantic Web (VISSW 2009) workshop in Florida to cover topics such as:

  • Visual interfaces supporting exploration and/or navigation of unstructured or structured data on the Web
  • Visualizing and interacting with semi-structured/linked data on the Semantic Web or the Social Semantic Desktop
  • Ontology-based visualization of & interaction with collections of data
  • Novel paradigms to interact with textual data, photos, music, videos on mobile devices
  • Lightweight components for casual users to publish/share their own contents on the Web
  • Real-world use cases
  • Lessons learned from intelligent interfaces built to interact with Web-based contents

It’s important that the people who are the ultimate point behind all technological development are kept at the centre of the planning too. Is anyone else working with visualisation of data in their design and development plan?