Panlibus

Panlibus Talis Panlibus

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Panlibus Podcasts

Categories

Archives

License

Creative Commons License

Archive for the 'Semantic Web' Category

Interesting developments at the Bibliotheque Nationale de France

BNFHaving read some documentation recently around the plans of the Bibliotheque Nationale de France (BNF) for what they call a “pivot” – a mechanism based on semantic technologies for optimising the value of the BNF’s entire web presence, including Gallica, its digital library, it was great to have the opportunity to hear Dominique Stutzmann from the BNF speak at the recent Eurolis Seminar in London.

The future of the library (Doom or Bloom?) was what the day event was all about, and according to Stutzmann, we’ve already invented it. We’ve got the nice buildings, and so ostensibly the library of the future will be the same as that of today. If the library space vanishes, he argued, it will only be the result of a self-fulfilling prophecy because librarians aren’t confident about what they’re doing. I think he’s really onto something – there is indeed an element of subjective crisis in the problem of the future of libraries. He admitted, though, that Web 2.0 re-presents the user-librarian relationship in quite a fundamental way; the user becomes both publisher and librarian. But users don’t want librarians to disappear. He seems to be saying that our library spaces continue to be successful, so leave them alone but engage with some interesting technological stuff as well, because libraries are well-positioned to do so. He added that users trust libraries with everything including long-term preservation of data, and BNF is clearly poised to exploit that trust, but not for its own ends, but for everyone, in the great universal tradition of libraries.

Stutzmann perceives the potential of semantic technologies very clearly in terms of the user experience – giving everyone improved and accurate access to the information available, and had an impressive array of exemplars to reel off, citing Google Book Search’s use of data mining tools taking city name from search results and pinpointing them on a map, and Bibliosurf’s map of novels as examples. Along similar lines, he demonstrated an interactive map with mashed up data from last-fm to produce a map of composers, where proximity indicates artistic commonality rather than geographical proximity – for example Beethoven is situated alongside Vaughan Williams.

As a Modern Languages graduate, I loved hearing about semantic search developments at the European Library and specifically in their TELplus project, where multilingual search (i.e. a search query with terms from more than one language) has been achieved. Stutzmann was clear that authority data is indivisible from semantic web developments, and that is where the librarian tradition really comes into its own; he demonstrated search results with LCSH headings as a facet on the side-panel. He pleaded with librarians to use metadata to give more accurate access to data.

The only downbeat element to his presentation was a survey carried out at BNF in 2008 to get a clearer picture of their users. A key finding was that the average user of the digital library 48, although there is an overall age range of 14-94. Europeana suffers from the same problem. Funnily enough, when I was out on Saturday night, a friend was saying how almost all the people who queued up recently in Birmingham to see the Anglo-Saxon treasures recently discovered in the West Midlands were white people aged 50+. Stutzmann pondered whether there was anything that could be done about it – does it come down to lifestyle fundamentals?

In the same survey, there was a fascinating finding about Library 2.0. Many users questioned felt that library sites should not be spoilt by the comments of user. They are happier to share their information and collaborate with the librarian than with other users. Obviously this goes against received Library 2.0 thinking, and left me wondering, is that a specifically “French thing”, or do UK users have more in common with their European counterparts than we think?

Will Linked Data mean an early end for Marc & RDA

For the uninitiated, NGC4LIB is a library focused mailing list which has a reputation for often engaging in massive discussions and disagreements around the minutiae of future cataloguing and library focused metadata practices.  They have recently been involved in one of these great debates stimulated by the comments of Sir Tim Berners-Lee in a recent interview.    As is often is the case on this list, the debate wandered well off topic in to the realms of FRBR and it’s alternatives before being brought back on topic by Jim Weinheimer, who started the conversation in the first place.

A statement in Jim’s contribution caught my eye:

Implementing linked data, although it would be great, is years and years away from any kind of practical implementation

hmg.gov.uk_data Implementing linked data is already well underway with many groups across the Globe.  For instance there are couple that we at Talis are closely involved with.  Following on from Sir Tim’s interview comments, the British Government are currently running a, soon to be opened, closed beta of data.gov.uk.  Through this site they are not only opening up data in many forms such as CSV, like their American cousins at data.gov, but they are also starting to encode in RDF and publishing it via the Talis Platform which provides a SPARQL (the query language of the Linked Data web) end point.  This approach not only lets anyone download the raw data, but also enables them to query it for whatever they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples.   In a similar vein, metadata from BBC programmes and music is being harvested in to Talis Platform stores.  Again these are open to anyone to innovate with – check out these screencasts  to see some of the early possibilities.

Ah but that is not bibliographic data, I hear someone cry – It’ll never catch on in libraries.  I get the impression from some comments on the NGC4LIB list, that it will not be possible for ‘our’ data to participate in this Link Data web until ‘we’ have predicted all possible uses for it, analysed them, and developed a metadata standard to cope with every eventuality.   There are already a few examples of the library world engaging with RDF and Linked data, one obvious one being the Library of  Congress with LCSH another the National Library of Sweden.  Neither of these examples are encoding the kind of detail you would expect in a Marc record, they are using ontology to describe associated concepts such as subjects.

There has been some ontology development towards this larger goal with Bibo (Bibliographic Ontology Specification).  Although not there yet, Bibo is good enough to be used in live applications whishing to encode bibliographic data.  Such an example is Talis Aspire.  Underpinned by the same Platform as the UK Government and BBC Linked Data services, it uses the Bibo ontology to describe resources an an academic context

Alongside data.gov.uk there is a Google Group conversation taking place. The refreshing part of this conversation is that it is between the producers of the data sets, those developing the way it should be encoded in to RDF, and those who want to consume it.  Several times you will see a difference of opinion between those that want to describe the data to it’s fullest, and those that wish to extract the most value from it. “I agree that is a cleaner way of encoding, but can you imagine how complex the query will be to extract what I want!”.  This approach is not unusual in the Linked Data world, where producers and consumers get together, pragmatically evolving a way forward.  Dataincubator.org is an open place where such pragmatic development and evolution is taking place.  Check out examples of a subset of Open Library data. (note this is an example of data, not a user interface).

Semantic Library _ Mark Twain Another, bibliographic focused, experiment can be found at semanticlibrary.org. From some of the example links on the home page, you can see that building in this way enables very different ways of exploring metadata.  People, subjects, publishers, works, editions, series, all being equally valid starting points to explore from.

Doth the bell toll for Marc and RDA?
Not for a long old time – Ontology like Bibo, and the results of work at Dataincubator.org and semanticlibrary.org, may well lead to more open useful, and most importantly linked, access to data previously limited to library search interfaces.  That data has to come from somewhere though, and the massive global network of libraries encoding their data using Marc ,and maybe soon RDA, are ideally placed to continue producing rich bibliographic metadata.  Metadata to be fed in to Linked Data web in the most appropriate form for that purpose.  There will continue to be a place for current cataloguing practices and processes for a significant period -supporting and enabling the bibliographic part of the Linked Data web, not being replaced by it.

No doubt the NGC4LIB conversation on this topic will continue. Regardless of how it progresses, there is a current need and desire for bibliographic data in the linked data web.  The people behind that desire, and the innovation to satisfy it, may well have come up with a satisfactory solution, for them, whilst we are still talking.

Will the eBook make it across the chasm

I’m currently hurtling through the English countryside on a Wifi enabled train having spent the day at E-books and E-content 2009 held at University College London.  An interesting and stimulating day  with a well matched but varied set of speakers, including yours truly (presentation on SlideShare).  The eighty strong audience were also a varied selection from academic libraries, academia in general, publishers and the information media.

The move towards a web of data, enabled by the emergence of semantic web technologies and practices, was one of my themes. Another was a plea for content publishers and providers to deliver their content to the user where he/she is.  Not expecting them to be driven to their site with a totally different interface.  This is a difficult one for the eContent industry, at a time when the publishers are in the middle of a “my platform is better than yours” battle.  Nevertheless, a student wants the content their course has recommended, not caring who published it or which aggregator their library licensed it from.

adoption curve In laying the ground, I initially discussed the technology adoption curve and how technologies don’t become mainstream overnight.  Any new technology, or new way of doing things, follows a standard pattern with a small number of innovators taking the initial often enthusiastic risk.  The early adopters then build on the innovators’ success and and join in, still very early with some risk. When the new way has been proven, adoption has increased and both costs and risk have fallen, the early and late majorities take it to mass acceptance and adoption.  This only leaves the laggards, who will only come on board if forced by circumstance.

As an adjunct to the adoption curve, I spoke about a chasm which technologies have to cross, between the early adopters and the early majority before they take off.  There are many promising technologies that failed to cross that chasm.  For example, technology watchers at the time predicted that the mini-disc would replace the cassette tape, but as we know the CD took that prize.

Today’s conference was mostly focussed on the eBook and it’s impact on libraries and publishers.  This is on the assumption that it will be the way of delivering book sized pieces of content in the approaching digital world.  In answer to a challenging question for the end of day panel, I concluded that this is by no means certain.  I believe direct access to articles will eventually see the end of the traditional journal issue format. In a similar way I believe there is a good chance that chunks of content, that are today of book size, may well be assembled and delivered in a digital object as yet to be identified.

So will the eBook jump the adoption chasm?  If I was a betting man I would only back it on an each way basis.  I believe that anyone betting their whole business model on it being a certain winner, may just be taking too much of a risk.

Photo from mstorz published on Flickr

Library of Congress launch Linked Data Subject Headings

Back in December I was very critical of the Library of Congress for forcing the take down of the Linked Data service at lcsh.info.  LoC employee, and Talking with Talis Interviewee, Ed Summers had created a powerful and useful demonstration of how applying Linked Data principles to a LoC dataset  such as the Library of Congress Subject Headings could deliver an open asset to add value to other systems.  Very rapidly after it’s initial release another Talking with Talis interviewee Martin Malmsten, from the Royal Library of Sweden, almost immediately made use of the links to the LCSH data.   Ed was asked to take the service down, ahead of the LoC releasing their own equivalent in the future.

I still wonder at the LoC approach to this, but that is all water under the bridge now, as they have now launched their service, under the snappy title of “Authorities & Vocabularies” at http://id.loc.gov/authorities/.

The Library of Congress Authorities and Vocabularies service enables both humans and machines to programmatically access authority data at the Library of Congress via URIs.

The first release under this banner is the aforementioned Library of Congress Subject Headings.

As well as delivering access to the information via a Linked Data service, they also provide a search interface, and a ‘visualization’ via which you can see the relationship between terms, both broader and narrower, that are held in the data.

To quote Jonathan Rochkind “id.loc.gov is AWESOME”:

Not only is it the first (so far as I know) online free search and browse of LCSH (with in fact a BETTER interace than the proprietary for-pay online alternative I’m aware of).

But it also gives you access to the data itself via BOTH a bulk download AND some limited machine-readable APIs. (RSS feeds for a simple keyword query; easy lookup of metadata about a known-item LCSH term, when you know the authority number; I don’t think there’s a SPARQL endpoint? Yet?).

On the surface, to those not yet bought in to the potential of Linked Data, and especially Linked Open Data, this may seem like an interesting but not necessarily massive leap forward.   I believe that what underpins the fairly simple functional user interface they provide will gradually become core to bibliographic data becoming a first-class citizen in the web of data.

Overnight this uri ‘http://id.loc.gov/authorities/sh85042531’ has now become the globally available, machine and human readable, reliable source for the description for the subject heading of ‘Elephants’ containing links to its related terms (in a way that both machines and humans can navigate).  This means that system developers and integrators can rely upon that link to represent a concept, not necessarily the way they want to [locally] describe it.  This should facilitate the ability for disparate systems and services to simply share concepts and therefore understanding – one of the basic principles behind the Semantic Web.

This move by the LoC has two aspects to it that should make it a success.  The first one is technical.  Adopting the approach, standards, and conventions promoted by the Linked Data community ensures a ready made developer community to use and spread the word about it.  The second, one is openness.  Anyone and everyone will not have to think ”is it OK to use this stuff” before taking advantage of this valuable asset.  Many in the bibliographic community, who seem to spend far too much time on licensing and logins, should watch and learn from this.

A bit of a bumpy ride to get here but nevertheless a great initiative from the LoC that should be welcomed.  On that I hope they and many others will build upon in many ways.  – Bring on the innovation that this will encourage.

Image from the Library of Congress Flickr photostream.

UKSG09 Uncertain vision in sunny Torquay

uksg Glorious sunshine greeted the opening of the first day of UKSG 2009 in Torquay yesterday.  The stroll along the seafront from the conference hotel (Grand in name and all facilities, except Internet access – £1/minute for dialup indeed!)  was in delightful sharp contrast to the often depressing plane and taxi rides to downtown conference centres.

IMG_0012 The seaside theme was continued with the bright conference bags.  Someone had obviously got hold of a job lot of old deckchair canvas.  700 plus academic librarians and publishers and supplier representatives settled down, in the auditorium of the Riviera Centre, to hear about the future of their world.

The first keynote speakers were very different in topic and delivery, but all three left you with the impression of upcoming change the next few years for which they were not totally sure of the shape.

First up was Knewco Inc’s Jan Velterop pitch was a somewhat meandering treatise on the wonders and benefits of storing metadata in triples – something he kept saying he would explain later.  The Twitter #uksg09 channel was screaming “when is he going to tell us about triples” and “what’s a triple” whilst he was talking.  He eventually got there but I’m not sure how many of the audience understood the massive benefits of storing and liking data in triples, that we at Talis are fully aware of.   Coincidentally, for those who did get his message, I was posting about the launch of the Talis Connected Commons for open free storage of data – in triples, in the Talis Platform.

Next up was Sir Timothy O’Shea from the University of Edinburgh, who talked about the many virtual things they are doing up in Scotland.  You can take your virtual sheep from your virtual farm to the virtual vet, and even on to a virtual post mortem.  His picture of the way information technology is playing its part in changing life at the university, apart from being a great sales pitch for it, left him predicting that this was only the early stages of a massive revolution.  As to where it was going to lead us n a few years he was less clear.

Joseph Janes, of the University of Washington Information School, was one of those great speakers who dispensed with any visual aids or prompts and delivered us a very entertaining 30 minutes comparing the entry in to this new world of technology enhance information access, with his experience as an American wandering around a British seaside town.  His message that we expect the next few years to feel very similar on the surface, as we will recognise most of the components, but will actually be very different when you analyse it.  As an American he recognises cars, buses, adverts, and food, but in Britain they travel on the wrong side of the road, are different shapes, and are products he doesn’t recognise.   As we travel in to an uncertain but exciting future, don’t be fooled recognising a technology, watch how it is being used.

A great start to the day, which included a good break-out session from Huddersfield’s Dave Pattern. He ended his review of OPACs and predictions about the development of OPAC 2.0 and beyond, with a heads-up about my session today, which caused me to spend a couple of hours in the hotel bar, the only place with Wifi, tweaking my slides.  It would be much easier to follow Mr Janes’ example and deliver my message of the cuff without slides – not this time perhaps ;-)

Looking forward to another good day – even if the sun seems to have deserted us.

Free hosting for Open Data

Over on our sister blog Nodalities, my colleague Leigh Dodds has announced the launch of  the Talis Connected Commons.

True to our desire to see a truly open web of data, under the terms of the Connected Commons scheme Talis is offering free access to the [Talis] Platform for the purposes of hosting public domain data. And the offer isn’t just limited to free hosting: the data access services, including access to a public SPARQL endpoint, are also freely available.

The terms of the offer are as follows: if you own, or are creating, a public domain dataset then you can store that data in the Platform as RDF, for free. We’re setting an initial cap of 50 million triples on each dataset, but that should be plenty of space in which to collect some really interesting data.

So have you got, or want to create, up to 50 million triples you would like to put in the public domain along with up to 10Gb of content.  Yes, well get yourself over to the The Connected Commons page and check out if you qualify.  There is also a FAQ to give you more detail.

The Connected Commons is for all sorts of data, but I’m positive that the library world provides a rich source of such open data sets – get in there guys and get your data openly linked and out there.

 

Library of Congress will run Linked Data service

After forcing the closure of the lcsh.info service, which was set up by, Talking with Talis interviewee, Ed Summers to demonstrate how the Library of Congress Subject Headings could be represented as a Semantic Web application using SKOS [as I reported last month], there has been speculation as to when and what LC itself would do.

The following quote from a presentation [¹]  at ALA Midwinter show that they have been thinking, and doing, something about it.

LCSH in SKOS. In 2008 the Library began a pilot to make a subset of LCSH freely available in SKOS format on the Internet. Making LCSH available in SKOS (Simple Knowledge Organization System) will facilitate its use for data manipulation and other applications on the Semantic Web and elsewhere. The web site on which it resided, lcsh.info, was not on an LC server, and was taken down in December 2008 to be replaced by the official site, expected to appear as <id.loc.gov/authorities> within the next couple of months. The Library of Congress remains committed to providing LCSH freely through SKOS. The former lcsh.info site will redirect users to the new URI.

A visit to id.loc.gov reveals the following [on a page last updated on January 22nd 2009]:

This site serves as a placeholder for forthcoming web services that will enable both humans and machines to programmatically access authority data at the Library of Congress. The initial services offered are influenced by — and therefore implement — the Linked Data movement’s approach of exposing and inter-connecting data on the Web via dereferenceable URIs. We aim to make resources available on this site within 6-8 weeks. Check this site regularly for more updates as we continue to develop this service!

and:

Initially, within 6 to 8 weeks, the Library of Congress will release its first offering: the Library of Congress Subject Headings. This will be an almost verbatim re-release of the system and content once found at the popular prototype lcsh.info service. The primary exception will be that the URIs for the data values will no longer take the form http://lcsh.info/{identifier}. Instead, they will start with http://id.loc.gov/authorities/{identifier}. If you have used the legacy lcsh.info metadata in an application, we advise updating to the new URIs, as we cannot guarantee a permanent redirect from old lcsh.info URIs to the new URIs at id.loc.gov.

Great to hear, and great for Ed.  Both that his work has stimulated the LC in to action and also demonstrated how it should be done.

My only thought on this is why did they go through all the fuss and negative PR about taking down lcsh.info before the LC service that replaces it was up and running a couple of months later?

[¹]  http://www.libraries.psu.edu/tas/jca/ccda/docs/lc0901.pdf (page 5)

Traffic Squad Police (LOC) image published in the The Library of Congress’ photostream on Flickr

Library of Congress forces LCSH Linked Data site to shut down

Back in May I was among others who welcomed the initiative by, Talking with Talis interviewee, Ed Summers in setting up lcsh.info.  This site was set up by Ed to demonstrate how the Library of Congress Subject Headings could be represented as a Semantic Web application using SKOS.

In the intervening months many including myself used Ed’s work as a pointer to how useful publicly available data could, with the use of open Linked Data principles, become a valuable part of sites and services across the globe.   For instance another Talking with Talis interviewee Martin Malmsten, from the Royal Library of Sweden, almost immediately made use of the links to the LCSH data.  Ed went on to get lots of feedback, and wrote a paper which he then presented at DC2008.

It is therefore with great disappointment that I read this on the lcsh.info site the other day:

On December 18th I was asked to shut off lcsh.info by the Library of Congress. As an LC employee I really did not have much choice other than to comply.

As a LC employee he was put in an untenable position when they obviously decided that they didn’t like this useful service based on publicly available data being delivered from a domain that doesn’t end in loc.gov.  I wonder if there are any other Linked Data enthusiasts, not held back by who their employer is, who would pick up from where he left off?

Ed goes on to say:

It was always my intention for concept URIs at lcsh.info to be cool. I advertised the service as ‘experimental’ and indicated it was going to hopefully inform the development of a similar continually updated service at LC where I work. …  My thought was I could leave the service running until there was something similar at LC that I could redirect the concept URIs to. After a year or two when people had rewritten there data to point at loc.gov I could retire lcsh.info. I never imagined I would be asked by LC to take it down.

LOC should have listened to Ed in the first place and taken the high ground in leading the work in to creating a semantic web of data with their valuable publicly available data.  At the end of his post Ed hints that LC is still considering running a service like lcsh.info at loc.gov, but it’s not there yet.  Why-o-why did they not learn from his work and ride the wave of introducing their own service based on his great initiative.  Instead they present to the world a short-termist not-invented-here attitude, that reminds me of other well established leviathans of the world of library metadata.

Let’s hope that Ed’s hint is correct and we will soon be able to welcome the release of Open Linked LCSH and other Data from the electronic portals of the LofC.

Traffic Squad Police (LOC) image published in the The Library of Congress’ photostream on Flickr

Dave Pattern challenges libraries to open their goldmine of data

The simple title of Dave’s recent blog post ‘Free book usage data from the University of Huddersfield’ hides the significance of what he is announcing.

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.

13 years worth of library circulation data opened up for anyone to use – he is right about it being unthinkable a few years ago.  I suggest that for many it is probably still unthinkable now, to whom I would ask the question why not?

In isolation the University of Huddersfield’s data may only be of limited use but if others did the same, the potential for trend analysis, and the ability to offer recommendations and who-borrowed-this-borrowed-that  services, could be significant.

If you have 14 minutes to spend I would recommend viewing Dave’s slidecast from the recent TILE project meeting, where he announced this, so you can see how he uses this data to add value to the Huddersfield University search experience..

Patrick Murry-John picked up on Dave’s announcement and within a couple of days has produced an RDF based view of this data – I recommend you download the Tabulator Firefox plug-in to help you navigate his data.

Patrick was alerted to Dave’s announcement by Tony Hirst who amplified Dave’s challenge “DON’T YOU DARE NOT DO THIS…”

As Dave puts it, your library is sitting on a goldmine of useful data that should be mined (and refined by sharing with that of other libraries).  A hat tip to Dave for doing this, and another one for using a sensible open licence to do it with.

Picture published by ToOliver2 on Flickr

Catching the next wave

Catching the next wave was the title of my opening track keynote presentation in the “Catching the semantic wave – or down in a sea of content?” session of the “Order out of chaos – creating structure in our information universe” track at the Online Information Conference 2008.  Presentation below from Slideshare.


By rob

This is a very well attended track.  Standing room only in most of the sessions, great interest in the Semantic Web, Web 2.0, and associated concepts and technologies.  From a lightly attended single session last year, this topic has grown in to an over subscribed 2nd track this year.  Having spent some time bending the ear of conference chair Adrian Dale last year about what was upcoming, I can wear my virtual I told you so hat with pride this year.  

My job as keynote was to provide a broad introduction to, and context for, things like Linked Open Data, the Semantic Web, Cloud Computing and clouds of data, setting the scene for the day.  Hopefully I was successful in my objective, the number of attendees is definitely a measure of the interest in the topics covered.

Considering that a large proportion of the attendees of the conference are librarians it is gratifying to note that they are already looking beyond the current Web 2.0 meme towards what will be washing over us next.    Thinking about this, it is hardly surprising.  The next wave is far more associated with data, metadata, linking and recommending, than the Web 2.0 meme of social networking, blogging and wikiing.  Dare I say it out loud, but by generalisation librarians appear to be far more comfortable with the concerns of data than socially interacting. 

lod-datasets_2008-03-31I get the feeling that these concepts are going to get adopted in libraries far quicker than we would expect once they start to gain momentum.  This would be helped if we could get past some of the terminology confusion.  The main culprit in this confusion being between semantics/semantic analysis and the semantic web.  The web of data, as against [or to be more correct in addition to] the current web of documents, is how I see the semantic web.  A great example of the web of data in action is the Open Linking Data Project.