Free Our Data!
There’s been an interesting conversation on the code4lib mailing list over the past few weeks.
LibraryThing founder Tim Spalding got the ball rolling back on 4 September, when he wrote:
“Does anyone have a complete collection of LC MARC bibliographic, authority or classification CDs from the Catalog Distribution Service? Want to lend them LibraryThing?
LibraryThing will put them to good use in an open, free, easy-to-use XML-based bibliographic data service.
I gather that, although the LC sells the CDs, there are no copyright or redistribution restrictions—indeed, that companies often resell them, with minimal value adds. I also gather that libraries generally don’t get these CDs, but rely on OCLC instead.”
(my emphasis)
Ed Summers responded;
“You rule for asking this question in public
I’ve only privately thought about it. It would do libraries and the web community in general a great service to make bibliographic and particulary authority data publicly available in a machine friendly way.”
LC, the United States’ Library of Congress, is one of the world’s great libraries, and a significant creator of the basic descriptions of books and other materials upon which libraries around the world rely for their catalogues. Other libraries also play an important role, of course, including our own British Library and some of our major university libraries in the UK and overseas.
At least as far back as the dawn of cooperative computerised cataloguing as pioneered by the late Frederick Kilgour and others, libraries have recognised the indisputable efficiency of cataloguing basic information about an item as few times as possible, and then disseminating that record in order that individual libraries may then add their own details and modifications.
The model was intended as one of community, of sharing, and of cost saving, yet somewhere along the line things have gone badly wrong and we are currently dealing with a system of closed clubs, exorbitant charges, and almost insurmountable obstacles to shared innovation.
The library sector allowed their brave vision to become sullied to this degree. Does the library sector have the will to do something about it, as Tim is so clearly trying to? How many of you are like Ed, only privately thinking the apparently unthinkable? Who, more importantly, could seriously stand up in public and defend what we have today?
Here in the UK, the Guardian newspaper has been running a campaign called Free Our Data, described on their blog as;
“On March 9 2006 the Guardian’s Technology supplement carried an article called ‘Give us back our crown jewels’. The argument is simple: government-funded and approved agencies such as the Ordnance Survey and UK Hydrographic Office and Highways Agency collect data using our funds, but then charge users and companies for access to it.
That restricts innovation and artificially restricts the number and variety of organisations that can offer services based on that most useful data - which our taxes have helped to collect.
Making that data available for free - rather as commercial companies such as Amazon and Google do with their catalog and maps data - would vastly expand the range of services available. It cannot make any sense that Google, an American organisation, is presently more popular with people aiming to create new map applications.”
That sounds familiar, doesn’t it? Libraries the world over take public funds to pay for the cataloguing of books, compact discs, DVDs, journal articles and more. Once the records are created, modern approaches such as those behind the Talis Platform mean that it is cheap to store, disseminate and provide programmatic access to the resulting data. Scaling access out to increasingly large numbers of applications is straightforward, and we have plenty of evidence of how this works.
So why aren’t the records being created in these libraries freely shared amongst all libraries and even out to interested third parties such as LibraryThing? Why do libraries have to pay ridiculous sums to join clubs in which they pass records amongst one another time after time, adding layer upon layer of questionable and largely opaque assertions of ‘ownership’ along the way? How much duplication do these exclusive clubs create, and how many of the libraries that most need the data can’t afford to join any of them?
In some cases, the technologies and processes deployed by the incumbents are simply hopelessly antiquated and inefficient. They cannot cope with significant use, they cannot scale, and they require vast budgets simply to keep their dated processes clunking along. How long, though, should some libraries have to pay too much - and others miss out entirely - whilst we wait for these groups to embrace newer, cheaper and more efficient ways of doing business?
And then, of course, there’s the possibility that some of the ‘owners’ of catalogue records are charging too much, and that they’re profiting at the expense of the community they’re supposedly serving. If true, how long should we let that continue?
So… Let’s Free Our (Library) Data! We all benefit, and you know it makes sense.
Go on, Library of Congress, Library & Archives Canada, the British Library, the Bibliotheque Nationale, the National Library of Australia, and all the rest; simply make your records freely available for meaningful machine access. Questionable justifications on the basis of ‘cost recovery’ no longer cut it. By all means recover your costs, but make sure those costs are real, reasonable, and necessary. Technologies such as Bigfoot [pdf] are designed to do this sort of heavy lifting, and we’d be delighted to help. If you’re worried about your bandwidth bill and server load, I know several individuals and organisations who would happily take the data from you, and then share it freely with the community. Work on licenses such as the TCL is designed to ensure that your real rights remain protected even once you let the data off your servers.
None of this prevents those institutions (and others) from charging for value added or staff-intensive processes atop the basic catalogue records. Indeed, it probably creates a larger and more vibrant market to which they can sell those premium services. For those who just need access to data in order to build services of their own, or to help their patrons, we’d be doing something both wonderful and achievable.
Go on. You know you want to. Surely there is no part of the chain that would want to stop the great libraries of the world from doing the right thing?
Today’s picture is again from Flickr, and again Creative Commons licensed. Henrik Moltke provides this picture of ‘free beer’, brewed from a Creative Commons licensed recipe that you can use yourself to brew beer that you give away… or to brew beer that you sell.
Technorati Tags: Libraries, Library 2.0, LibraryThing, Free Our Data, Talis, Talis Platform, TDN














October 31st, 2006 at 9:17 pm
I totally understand why institutions form consortia that are closed to outsiders to do sharing - even if the data is free, the management in an institutional way of these sorts of systems can be expensive, and you need to cope with that somehow.
That said, some kind of low budget approach to this data sharing should actually be very practical to do these days. With the plummeting costs of disk space and bandwidth, serving up every MARC record in the universe is easy to imagine. I don’t know if you could make it beautiful to use on a zero budget! But just getting the data out to work with and accumulating the relevant tools isn’t hard to imagine.
(hmm)
October 31st, 2006 at 9:33 pm
Ed
that was pretty much my point - although you put it more succinctly than I have over the past few posts… I, too, see why there WAS a reason to form these consortia, and to share the (previously high) costs amongst a finite set of members.
My thesis, as you spot, is that many/most of those costs can now quite feasibly be removed from the system; and that the remaining cost of providing something ‘beautiful’ can actually be ammortised against the greatly increased value inherent in a larger pool of data and participants. It *can* effectively be free to participate, but you need to re-engineer the technology *and* any ‘facilitating’ bodies before those savings become real.