Licensing Open Data – Creative Commons and Talis have something to say
Well that‘s a relief.
As you may have noticed, we announced the birth of the Open Data Commons Public Domain Dedication and Licence this morning, following up on Lawrence Lessig‘s unveiling at Creative Commons’ fifth birthday party in San Francisco in the wee hours of our Sunday morning. Happy Birthday, Creative Commons, and we look forward to building upon this relationship for many birthday parties to come!
Regular watchers of Talis will be aware that we’ve had an interest in data for a long time, and that we’ve been active in the licensing issues behind Open Data for a couple of years now. Today’s announcement is an important milestone in that journey, but we’re not finished yet.
Back in 2006, we released our first public attempt at an open data licence, the Talis Community Licence, and began to use it for some early submissions to the Talis Platform. In building a Platform, we recognised from the outset the importance of recognising – and celebrating – the rights of those contributing their data to the shared pool. The Talis Community Licence allowed us to do that.
Not long after, Tim O’Reilly wrote;
“One day soon, tomorrow’s Richard Stallman will wake up and realize that all the software distributed in the world is free and open source, but that he still has no control to improve or change the computer tools that he relies on every day. They are services backed by collective databases too large (and controlled by their service providers) to be easily modified. Even data portability initiatives such as those starting today merely scratch the surface, because taking your own data out of the pool may let you move it somewhere else, but much of its value depends on its original context, now lost.”
We certainly share those sentiments, and I’ve used the quote in several presentations since Tim wrote it.
During 2007, our interest continued to grow. In public, we convened a workshop on Open Data at the World Wide Web conference in Banff in the Spring, and reached out to Jordan Hatcher and Charlotte Waelde over the Summer, to help us extend the principles of the Talis Community Licence to the global stage. That they did, and over the past couple of months we’ve all been beavering away to align their initial offering with a parallel activity incubated within Creative Commons.
It’s been fascinating to work closely with Science Commons during this process, and I’ve also welcomed the opportunity to work with Jordan and Charlotte again in dotting legal ‘i’s and crossing initially incomprehensible ‘t’s. Together, they have produced a vitally important component in the toolkit that will encourage and facilitate real sharing of data. There is more to come, but the steps announced today mean that we can all move forward in lowering the walls of our silos, releasing data to play its part in the Data Web. All of us invest heavily in collecting and curating data, which is traditionally locked away and left to atrophy, failing to achieve anything like its true potential. Appropriately released and sensibly licensed, data held by every one of us can contribute hugely to the promise of the Semantic Web. Here, the whole really is far greater than the sum of its parts.
I’m really pleased to be able to share this latest piece of work today, and invite everyone to take a look, think about how it would work for them, and join in both freeing your own data and carrying the conversation to those still unaware that there is an issue to be addressed here.
Technorati Tags: Charlotte Waelde, Creative Commons, Jordan Hatcher, marc canter, open data, Science Commons, Semantic Web, Talis, Talis Platform, Tim O’Reilly, Web 2.0, Web 3.0




December 18th, 2007 at 8:46 am
There is some ambiguity in the definition of “Data” and “Database” as defined in PDDL. The ambiguity is because the statement “This licence is intended for use on databases or their contents (”data”), either together or individually” (PDDL) and the definition of “Work” in PDDL consider “Data” and “Database” as two different entities. The following definition of “Database” may be more accurate:
“A systematic or methodical arrangement of Data that is individually accessible by electronic or other means offered under the terms of this document”.
In my understanding in semantic web the Database shall represent the properties of a standard ontology (I shall call them “normative metadata”). The value of these properties i.e. Data, is the object. The objective of providing open access to the Database shall be to inform external agencies about the standard ontologies supported in a Database. This open access shall facilitate integration of semantic web applications, Database and Data. One of the methods for providing open access is Web Services Description (WSD) document. Hence the “Database” definition must inform that it is this “systematic and methodical arrangement” that is provided open access by suggesting Database open access.
Question: Why should there be a need for changing “All rights reserved” to “Some rights reserved”?
The definition of “Use” in PDDL suggests “doing any act …”, the actions that are allowed on Data shall be advertised by the owner of “Work” through WSD. Therefore why should there be a need for ODC Public Domain Dedication and Licence.
Question: Why is Open Access Data Protocol required?
The three principles of Open Access Data Protocol, section 3 are covered in WSD. The norms and expectations of different disciplines required for data access and integration can be covered with appropriate message interface and service definition. Refer section 4.4, all data that is made available for interoperation shall be advertised in WSD, data that is non-open is not advertised in WSD. If a provider has proprietary metadata it may be advertised through WSD and requester may apply own discretion to use it.
Further in section 5.1, it is stated “We do not know … copyright infringement lawsuit”, consider a semantic web architecture that shall evolve by integration of applications and databases based on WSD documents only then there may be no copyright infringement issues and false expectations. Attribution stacking is automatically resolved by virtue of automated data integration based on WSD document, the data provider has given open access rights for the data advertised through WSD without seeking attribution. Therefore a scientist who integrates data from 40,000 sources has all open access data and is not under any obligation to attribute data.
The major advantage of WSD is that it advertises what access rights on “Work” are given to public. Data provider and requester can communicate and integrate data and services via WSD document. Why should the data requester read the licence document of the data provider?
December 18th, 2007 at 10:30 am
Paul can perhaps talk a bit more about the WSD and its relation to the Open Data Commons project and the Science Commons protocol.
I’ll talk more about the legal bits.
First your comment about the definition of data and database.
I don’t see an ambiguity here, but please comment on the licence itself so we can get a discussion started there about it. We want to work out any kinks in the beginning before it goes live. For your suggested change, I don’t see a difference from what you suggest to what is used in the PDDL:
The formulation used in the PDDL tracks the definition of a database in the Article 1(2) of the Database Directive, and so has the advantage of using the formulation used in the relevant legislation.
The other issue is that the data and the database are in many ways treated differently under the law, and that is why they are separable under the PDDL. You could have a database of CC-licensed images off of Flickr for example. You can’t use the PDDL for the ‘data’ in this case — the contents of this database — because it is not yours to license. But you can still have copyright and database rights over the database as a whole and over the contents, which could be covered by the PDDL.
Now on to the other points related to the WSD. My understanding is that the WSD is not a licence or in fact a legal document. It might have a legal effect — an implied licence due to giving people permission to use the data — but that doesn’t make it a licence. This is the legal tool behind any expression of rights in a machine readable language. In practice someone might not have to read the PDDL under this implementation, but that doesn’t mean that it doesn’t need to be there.
The other element is that the PDDL can be used outside of the WSD.
Thanks for the comment.
December 18th, 2007 at 1:35 pm
Since “Data” and “Database” are two different entities and it is highlighted clearly in PDDL, therefore I strongly suggest that “Database” definition must be changed to reflect the difference between the two terms. “Database” is an arrangement of “Data” and this arrangement is properties of standard ontology. By suggesting that “Database” is a collection of “Data”, the two entities cannot be separated as required in the given example of CC-licensed images database.
If the licence is for the arrangement of “Data” then “Database” definition in PDDL is ambigous.
December 19th, 2007 at 11:44 am
Thanks for your continuing comments.
The database and data are different elements, but they are not unrelated. You can have data without a database — totally unstructured. You can’t however IMHO have a database without contents.
As a general rule, database copyright protects the selection and arrangement of the contents of the database. So this is how the database is structured. It can also protect other elements of the database, such as the field names.
EU database rights however protect the database when there has been a substantial investment in obtaining, verifying,or presenting the contents of the database.
In the Flickr example, one could perhaps both gain a database copyright and database rights over the database, depending on whether the relevant tests were met. One couldn’t however get database rights without “contents of a database”.
If what you are saying is that the copyrightable element — the selection and arrangement of the data — can be separated from the data and thus be independent of any data I have two thoughts.
One is that approaches what patents, and not copyright, would protect as it is an idea of how you arrange data. The second is that even if it is separable from data (which I still doubt) it still depends on it being a selection and arrangement of something. That thing is the data. So defining a database as done in the PDDL covers this IMHO.
The issue we may be having might be the difference between what a database “is” and what the law protects. Focusing just on the selection and arrangement (in your proposed definition) in the definition of the database doesn’t cover database rights, or indeed the other copyrightable elements of a database.
What may help is including an adapted version of the language that was in the other draft — Open Data Commons DBL — at the beginning of the PDDL:
Thanks again.