« I can discover it but I can’t have it: resource discovery and fulfilment | Main | Nailing the Web 2.0 jelly »
27 September 2005
Wasted Tax Euros?
Posted by Richard Wallis at September 27, 2005 11:47 PM
William Heath in his EU member states v Google: no contest posting on Ideal government europe is demanding his taxes back from the European-jingoistic response to Google's audacity in wanting to search the whole planet including Europe's Library collections. [I paraphrase].
His complaint being that the European Governments have piled in loads of money to try, and not succeed, to beat Google at their own game.
Mind you quotes like this from Jacques Chirac the French President instils one with a certain amount of trepidation:
"We're engaged in a global competition for technological supremacy"....."In France, in Europe, it's our power that's at stake"
So taking a look at The European Library you find a traditional OPAC-looking web site which performs a broadcast search across an EU online catalogue, and the catalogues hosted in the member countries. It’s interesting [the first time you do a search] to watch the result counts (and the odd error) totting up from each member country. Thereafter you start wondering why the results are not just there, as they would be in Google.
Once you scratch the surface, it becomes very clear that it is the result of stitching together many dissimilar systems. For instance the display of character sets - search for 'harry potter' and the first result from the British Library displays thus "Harri Potter i v'i?a?zen' Azkabanu / Dz?h?. K. Roling ; z anhliis'koï pereklav Viktor Morozov ; za redakt?s?ii?e?i?u? Ivana Malkovycha." I'm sure in a European project, with more participating countries and therefore languages than you can shake a stick at, they could have got the correct display of Unicode/UTF8 sorted from the start. The site comes with a Beta Version Warning so that perfection cannot be expected, maybe we have just got used to the quality of other perpetual beta offerings.
So is William's plea for his money back a valid one? Unfortunately not. Google Print, which he is comparing it with, searches the contents of books that have been digitized. Whereas The European Library, like traditional library search engines uses the bibliographic metadata that has been catalogued about the books. So they are doing very different jobs. Mind you if you could combine both capabilities, that would be powerful.
That is not to say that his points about user experience are not valid, they are. This comparison does raise some fundamental issues though.
Why do Google services always go like a rocket? - Because they harvest the content in to their own servers. Why don't projects like The European Library do that? - Because its too difficult (harvesting, hosting, supporting the load, copyright reasons, ownership reasons, competing thiefdom reasons, imprecise protocols and data formats, project funding uncertainty, etc., etc.) well at least it always has been too difficult to do it any other way.
Looking at the site it seems apparent that its underlying architecture is a traditional Web 1.0 application with a smattering of things like XSLT to add functionality in to the user interface. So is there a better way in the emerging world of Web 2.0?
Taking it to the extreme, why not let the users browser search all these catalogues directly. Routing it all through some servers in The Netherlands is bound to slow things up. Some of the AJAX work I have been doing for the Talis Research days, shows that is very possible.
There again with intelligent harvesting where possible, plus intelligent caching where not, aggregated search performance that approached Google speed may be possible. And anyway does the user really care which country's library holds an item. They firstly need to find out if what they are searching for exists, and then how to get access to it. [But that is a whole other story!]
Those who have been following Panlibus and other Talis Blogs such as Silkworm will know that we have been deep in to projects addressing the way that open participative architectures utilising techniques loosely labelled as Web 2.0, can open up closed collections such as Library catalogues [from the local to the national] liberating them to become part of a distributed information environment. As the Talis Insight Conference approaches, you should be hearing more on this.
Trackback Pings
TrackBack URL for this entry:
http://blogs.talis.com/mt/mt-tb.r280.cgi/130
Comments
Thanks Richard. I undertand that much better now! Always good to hear from an expert :-)
Posted by: William at September 28, 2005 09:14 AM
With regard to the European Library :
The character set problems you highlight are a failing of Internet Explorer not the EL. If you perform the same search in Firefox you will see the offending characters displayed correctly. The EL outputs in utf-8 - at some stage I think we might merge the two ligatures into a single floating character. Try "lenin single works" to see some Cyrillic - there's also Greek, Hebrew and some other rare characters in there somewhere.
The European Library architecture does support harvesting by OAI PMH into a central database. It's just that some partner libraries don't want to do this at this stage for their main catalogues. Their catalogues are accessed by SRU directly or through a SRU/Z39.50 gateway.
OK, so the result is slower than Google. So what, the separation of the results by the partner is probably more useful than just lumping the whole lot together. We will see the results user testing provides.
The European Library portal is a Web 2.0 application, possibly designed before "AJAX" was even thought of. The portal runs in the user's browser. The national libraries catalogues are searched directly without an intervening portal server, but at the moment, to avoid the "Cannot access data across domains" security issue" they are accessed by a proxy at the KB site. I am sure EL will find a way around that at some stage.
For further information, see the article in DLib, February 2004. "Search and Retrieval in the European Library".
Would you find it useful if I described how EL works at the Insight Conference ?.
Bill
Posted by: Bill Oldroyd at September 29, 2005 04:30 PM
