The Open Library, and keeping it open
A couple of days ago Richard blogged about data licensing on The Open Library.
Aaron came back to comment:
Our position is that the actual catalog data on Open Library consists of uncopyrightable facts and thus is public domain. We certainly aren’t going to assert a copyright on it. The real open question is what copyright to use for descriptions and bios and other longer textual material — should we use GFDL, like Wikipedia, or some more reasonable license?
Certainly I agree with him that the factual data cannot be protected by Copyright. Facts, titles, names, short phrases, single words; none of these can be Copyrighted in the sense of a Creative Work, but there is more to Copyright than that.
A few weeks back I was talking in Banff and then in Paris about the need to license data, not to keep it closed, but to keep it open. In that discussion I broke the world into three parts, Data, Metadata and Content. Aaron is doing the same kind of split - bibliographic data and review content. That’s the right distinction to be making.
For the creative aspects there is Copyright protection, and various licenses extend this in different ways, CC, GFDL and others. The Open Library should pick whichever is the closest match to what they want to achieve. I suspect a CC-BY license would be closest, but that’s a decision for them and the community.
But what about the data? The question isn’t “can it be protected?” but “how does it need protecting and what from?”.
Now, I trust the Internet Archive. They’re probably the only people on the internet to have a wholly untarnished halo and that’s a very good thing. But things can change. There are direct parallels between what The Open Library is doing and what CDDB did back in the 90’s. The Fez Guys have a great write up of what happened to CDDB/Escient/Gracenote, but to summarize… A large community generated database of music metadata got locked away by a corporate body. It didn’t happen because CDDB planned all along to dupe their community, it didn’t happen because anyone was ‘evil’. It happened because a commercial organization needed to make money and the community had no protection from that.
An alternative service did spring up, which is what you want to happen in that circumstance. FreeDB.org set up using the CDDB software and someone in the Gracenote extended staff leaked them a copy of the database. With the correct licensing in place - a data equivalent of the GPL - FreeDB could simply have requested a copy. The community would have been protected.
I don’t want what happened with CDDB to happen with The Open Library, and to stop it requires a clear license that protects the community from The Open Library as well as The Open Library from anyone else.
This is the area we developed Talis Community License to cover (and yes, the name is draft too, it will change). We’ve been using it to protect contributions to our platform data services for over a year. It protects contributors from us as it prevents us, or anyone else, from locking the community’s data away at a later date. It’s an Open License, anyone can use it to protect their users contributions in the same way.
Technorati Tags: Open Data, Open Library, Creative Commons, Talis Community License












From Aaron Swartz
July 17th, 2007 at 1:42 am
[Your TypeKey sign-in link doesn’t work because you haven’t signed up for TypeKey.]
Thanks for the kind words. The bottom of every page says: “Some information provided for promotional purposes by the publisher. Additional information and edits added by users. All contributions are in the public domain. For more information about our data, see how you can help.” We’re hoping to decide on licensing terms in more detail with the help of the community on the lists.