Panlibus

Panlibus Talis Panlibus

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Panlibus Podcasts

Categories

Archives

License

Creative Commons License

Archive for the 'Metadata' Category

Google Book Search - not so free with their jacket images

When, on the April Library 2.0 Gang, Tim Spalding asked Google Product Manager Frances Haugen about the uses of Google data, specifically book jacket images received via their new API, we got the impression that there were no restrictions against using them for display in your OPAC.

As Tim posted last week, things seem to have changed:

A few months ago when the Google Book Search API came out, I was among the first notice that GBS covers could be used to deck-out library catalogs (OPACs) with covers, potentially bypassing other providers, like Amazon and Syndetics. I subsequently promoted the idea loudly on a Talis podcast, where a Google representative ducked licensing questions, giving what seemed like tacit approval.

It seemed so great–free covers for all. Unfortunately, it now seems that it was too good to be true. At a minimum, the whole thing is thrown into confusion.

Tim was contacted by ‘a major cover supplier’ saying that a large percentage of the Google covers were, in fact, licensed to Google by them. They never intended this to be a "back door" to their covers, undermining their core business. - oops!

This coupled with the recent alteration to the Amazon Web Services customer agreement:

5.1.3. You are not permitted to use Amazon Associates Web Service with any Application or for any use that does not have, as its principal purpose, driving traffic to the Amazon Website and driving sales of products and services on the Amazon Website.

… means that those looking for a free source of book jackets will have to look elsewhere?

Technorati Tags: , , , ,

Giving Away the Public Domain

 

Scanning Boing Boing this morning for its usual daily meanderings through the underbelly of wonderful stuff on the web I find an interesting snippet titled Giving away the public domain docs the US gov charges money for

The piece points to http://public.resource.org/ and Carl Malamud, the site’s creator, is quoted as saying

public.resource.org has created a mirror of NTIS.Gov’s [National Technical Information Services] store that sells public domain materials. Our twist is that instead of sending you the materials, we’ll release them back into the public domain for everybody to use. We mashed some of the materials up in a little infomercial here.

In the US the law states that stuff produced by the federal government goes into the public domain. They don’t get the privilege of Copyright or any other protection over the contents of reports, photographs, papers or databases.

This doesn’t stop the government charging for them though.

The radical step taken at public.resource.org is to use a one-off purchase by a member of the public to free the data back into the public domain.

Casey Bison talked about releasing the LC data under a similar pattern of thinking, but with so much FUD around who may or may not own what data it looks like that hasn’t been organised just yet.

It’s a shame Library of Congress isn’t on the list at public.resource.org, or I may just have stumped up for it myself.

Found via Boing Boing

Why Nodalities?

I read the Panlibus blog - I note Talis has another house blog called Nodalities - why is this and why/who should be reading it??”

One of the major recurring themes from myself and others in Panlibus postings is Library 2.0 and its more general cousin Web 2.0. If you followed the links I provided to their descriptions in Wikipedia you will have discovered that they are both labels for a collection of attributes as against specifications.

I have yet to read a complete concise definition of what Web 2.0 or Library 2.0 ‘is’ [and probably never will], nevertheless it is far simper to look at an application or service and pronounce to the world that it is very Web 2.0 and be fairly confident that people will understand what you mean.

Web 2.0 is virtually all about technology, Web Services, Service Oriented Architecture, Social Networking tools, etc. etc., whereas it’s Library relative mixes all of that with a heavy dose of using those Web 2.0 tools and the customer handling & social skills of the library community to provide a better service to library users. - Debates about the use of mobile phones, and the provision of coffee, in a Library environment are often found in the Library 2.0 world.

We at Talis are the ‘Technology Guys’ in the Library equation, and although interested in all that is debated, our motivations are all about how new and emerging technologies [currently labelled Web 2.0] can be beneficially applied in the Library world. To this end you will find me and my colleagues evangelising on the subject both here and at conferences around the world such as these: Access2006, Internet Librarian International, Stellenbosch Symposium, Internet Librarian 2006, and the Charleston Conference.

The Talis Platform is an excellent example of applying Web 2.0, Semantic Web [to mention another ‘label’], SOA, and other technologies to provide innovative solutions to the liberating of library data, functionality, and services for the benefit of all.

In the process of proposing and delivering those [currently library specific] solutions, we are pushing both the theoretical and practical boundaries of web technologies and the theories and standards that are behind them - especially in the World Wide Web Consortium where you find Talis involved with several comittees. In doing this we are very active members, with much to contribute and say, of the world community driving forward these technologies.

This is where Nodalities comes in. You will note [today] that there is a posting from me picking up points from the blogs of Ian Davis and Sam Tunnicliffe, from our Platform Team, who are currently at the Web 2.0 Summit in San Francisco. If you are interested, like I am, in the way that all things Web are [and are being predicted to be] moving, you will find what they are reporting most engrossing.

Reading between the lines of what is being presented it is clear that the advances already being demonstrated by the Talis Platform are only the first step in a massive change in the way large sets of data and metadata (often only linked by semantics), can be marshalled, related together, and combined to change the way information is used in the future.

Dependant on the context, you will find Talis people attending and/or speaking at both Library and more general conferences across the world. Our knowledge, and understanding, of the issues surrounding the library and information industries is very valuable input into the wider technology world. As we have demonstrated this is a two way street. It is absolutely certain that our knowledge and understanding of the Web 2.0 world is already adding unique value to the world of libraries.

So to answer the question at the start of this posting…..

If you are in the library community and want to keep abreast of technology advancements - read Panlibus. If you are in the wider web community and are interested in what we are doing, and have to say about, applying these technologies as a Platform in real world situations - read Nodalities. I suspect most people, although with concentration on one, will find postings of interest in both Panlibus and Nodalities.

Technorati Tags: , , , , , ,

Addictive cataloguing by the masses

You’ve got to hand it to those Google guys for coming up with out-of-the-box thinking.

Take Google Image Labeler for instance.  The worst thing about this latest Beta from the World Domination stable of ideas is the name.  As John Battelle points out.

As John also points out, what Google call labels the rest of the planet know as tags.

I just wish Google would use the terminology the rest of the web has already settled upon. It’s not a label. It’s a tag. “Tag” means something - an intentional attribute given to an object on the web. That’s what we are doing here. How about we help Google come up with a new name?

So what is it then?  It is two things:

  • An addictive bit of simple fun.  You are randomly partnered with someone else then the two of you have 90 seconds to agree on at least one label for each of the images [from within Google Image Search] you are presented with.  If you both enter the same label, you gain 100 points and another image is presented.

    An ideal bit of fun to dip in to for a few minutes the next time you fill your coffee cup.  Be warned though, be prepared for you to be still playing it as you finally drain the cup!

  • An innovative way of building up folksonomy around the images that Google reference.  By harnessing peoples natural addiction to this sort of game, [As of the moment someone named eGrunt has amassed the staggering total of 1,324,400 points - does this person sleep!]  they are rapidly building up a human-validated set of search tags for their images - all for free.  At the moment there does not seem to be any value, other than qudos, attached to the points gained.

Google, like many of us who have tried to find relevant images from their Image Search, have identified that just scouring the page [that contains an image] for relevant keywords is not as useful as you would expect in cataloguing the image its self.

One benefit unique advantage Google have in launching such an initiative is their global reach.  They launch a new Beta, within hours the Google watchers blog about it, within a day or so thousands are playing with it.

Would something like this work for cataloging tagging your dusty collection - probably not as most players would grow old waiting for a partner.  But how long before a Google Book Search version appears? In which case the question will be, will Google see this as more secret-source or would they provide an open api to it?

 

 

Technorati Tags:,, ,

Out-of-copyright book download from Google

Reported by Michael Arrington on TechCrunch, Google Book Search has added the ability to allow PDF downloads of out-of-copyright books.

Until now, Google only allowed people to read the out-of-copyright books online (and only snippets of copyrighted works). To search the database of available full titles, go to books.google.com and click the “full view books” option when searching. This new move contradicts earlier statements by Google that scans of out-of-copyright books would not be made available for printing.

Belushi Book brings tears to cataloguers eyes

The Onion reports: Dewey Decimal System Helpless To Categorize New Jim Belushi Book

“With all due respect to the author, we remain unsure how to categorize this particular work,” said the chair of OCLC’s Editorial Policy Committee

I bet the social taggers, building up folksonomies, don’t have the same problems. To be fair though they are not trying to shoe-horn the book in to a rigid classification system - mind you isn’t that the point.

Listen to the Library 2.0 Gang

Anyway apart from being mildly amusing this gives me a good opportunity to recommend a listen to the Library 2.0 Gang podcast from a couple of weeks back on the subject of folksonomies and tagging - well worth a listen. On the Gang for this session were Casey Bisson, Ian Corns, Christina Pikas, Karen Schneider, and Tim Spalding.

Technorati Tags: , , , ,

Wikicat

The Wikimeadia Foundation the international non-profit organization behind some of the largest collaboratively-edited reference projects in the world including Wikipedia, have a project that has been running for the last few months named Wikicat.

Wikicat’s basic premise is to become the bibliographic catalog used by the Wikicite and WikiTextrose projects. The Wikicite project recognizes that “A fact is only as reliable as the ability to source that fact, and the ability to weigh carefully that source” and because of this the need to cite sources is recognized in the Wikipedia community standards. WikiTextrose is a project to analyze relationships between texts and is “inspired by long-established theories in the field of citation analysis

In simple terms the Wikicat project is attempting to assemble a bibliographic database [yes another one] of all the bibliographic works cited in Wikimedia pages.

It is going to do this initially by harvesting records via Z39.50 from other catalogues such as the Library of Congress, the National Library of Medicine, and others as they are added to their List of Wikicat OPAC Targets. Then when a citation, that includes a recognizable identifier such as ISBN or LOC number, is included in a page the authoritative bibliographic record can then be used to create a ‘correct’ citation. Eventually the act of citing a previously unknown [to Wikicat] work should automatically help to populate the Wikicat catalogue. - Participative cataloguing without needing to use the word folksonomy!

Putting aside the tempting discussion about can a Z39.50 target be truly described as an OPAC, the thing that is different about this cataloguing project is not what they are attempting to achieve but how they are going about it. The Wikicat home page states:

It will be implemented as a Wikidata dataset using a datamodel design based upon Wikidata dataset using a datamodel design based upon IFLA’s Functional Requirements for Bibliographic Records (FRBR) [1], the various ISBD standards, the Library of Congress’s MARC 21 specification, the Anglo-American Cataloguing RulesThe Logical Structure of the Anglo-American Cataloguing Rules, and the International Committee for Documentation (CIDOC)’s Conceptual Reference Model (CRM)[2].

So it isn’t just going to be a database of Marc records then!

Reading more it is clear that once the initial objective of creating an automatic lookup of bibliographic records to create citations has been achieved, this could become a far more general open participative cataloguing project, complete with its own cataloguing rules managed by the WikiProject Librarians.

Because they are starting with FRBR at the core of the project, the quality, authority and granularity of the relationships between bibliographic entities potentially could be of the highest quality. This could lead to many benefits for the bibliographic community, not least a wikiXisbn service [my name] that is ‘better’ than OCLC’s xISBN.

So does the world need yet another cooperative cataloguing initiative? - working for an organisation that has cooperative cataloguing in its DNA for over thirty-five years, I should be careful how I answer this!

Throwing care to the wind - Yes. When you consider that all the other cooperative cataloguing initiatives [including as of today the one traditionally supported by Talis] are bounded by project, geographical, institutional, political, subject area, commercial, exclusive licensing, or high financial barrier to entry issues. What is refreshing about Wikicat is that, like Wikipedia, the only barrier to entry, both for retrieving and adding data, is Internet connectivity.

Unlike Wikipedia where some concerns about data quality are overridden by the value of it’s totally participative nature, the Wikicat team are clearly aware that the value of a bibliographic database is directly connected to the quality, consistency and therefore authority of the data that it holds. For this reason, the establishing of cataloguing rules and training for potential editors overseen by the WikiProject Librarians is already well detailed in the project operational stages roadmap.

I will be watching Wikicat with interest to see how it develops.

Technorati Tags: , , , , , ,

Journal TOC + RSS = TOCRoSS

The formal press release regarding JISC project - TOCRoSS was published today.

The 10 month joint project between Emerald Group Publishing Ltd,The University of Derby, and Talis will deliver standards, open source software and a test bed example installation at Derby.

The project team will be proposing an open standards extension to the RSS 2.0 standard to encode metadata associated with e-journal publishing events, for example publication of a journal, issue or article.

To drive TOCRoSS, an RSS server located at the publisher site will generate a ‘feed’ of information that can be automatically picked up by an RSS monitor located at the customer site. The project will also develop a plug-in module for the library management system to enable the catalogue and the OPAC to be updated with the information from the RSS stream.

With TOCRoSS in place, e-journal table of content data will be fed automatically into library catalogues without the need for cataloguing, classification or data entry. This will improve the accuracy of records, save time for library staff and deliver a more integrated OPAC experience to
library users. It will be of particular value to academic libraries, where students often choose search engines such as Google over the library catalogue or myriad databases for tracking down articles and information.

Full article [pdf]

Yet another use of RSS, this time for consumption by both humans & systems.

Structured Blogging

Dick Hardt, CEO of sxip Identity and subject of our very first podcast, draws my attention to the new Structured Blogging initiative.

“Structured Blogging is a way to get more information on the web in a way that’s more usable. You can enter information in this form and it’ll get published on your blog like a normal entry, but it will also be published in a machine-readable format so that other services can read and understand it.

Think of structured blogging as RSS for your information. Now any kind of data - events, reviews, classified ads - can be represented in your blog.”

I shall definitely be taking a closer look, as it sounds potentially powerful… …and it works with Movable Type, the technology behind this blog.

Technorati Tags: , , , , , , ,

CrossRef adds Web Services

Amongst a set of eleven topics with reasonably current draft posts (some of which I think I’ll have to give up on and just delete…) sitting in ecto, this recent press release from CrossRef is worth bubbling to the top of the pile, as it can be dealt with briefly.

“CrossRef Web Services will create an easy-to-use tool for authorized search and web services partners to gather metadata to streamline web crawling. The CrossRef metadata database contains records for the more than 18 million items from over 1,500 publishers, the majority of whom are expected to choose to participate in this optional new service. The CrossRef Search Partner program provides standard terms of use for search engines, libraries, and other partners to use the metadata available from CrossRef Web Services – terms that promote copyright compliance and the important role published works of record play in scholarly communication.

CrossRef Web Services also provides search partners with a map to published scholarly content on the Web. In this way, it functions as a notification, or ‘ping’, mechanism for the publication of new content. Alerting crawlers to new or revised content to be indexed greatly reduces the need for ongoing re-crawling of publisher sites. ‘Search engines want better ways to gather standard, cross-publisher metadata to enhance their search indexes. Publishers want to streamline the way they provide metadata to a growing number of partners. CrossRef Web Services and the Search Partner Program fill this void,’ said Ed Pentz, Executive Director of CrossRef. ‘With CrossRef repurposing parts of its metadata database and using common protocols like standardized XML and OpenURL (and SOAP, RSS and other protocols in future), these services can significantly enhance indexes.’”

Outside the internal systems of big publishers like Wiley, where Digital Object Identifiers (DOIs) play a largely invisible role in tying the whole thing together, the DOI hasn’t really gained the traction that I initially believed it would. This seems a shame, and is presumably largely due to the financial models currently deployed in supporting the central organisation and associated services such as CrossRef.

The relative complexity or effort involved in deploying DOIs within a system such as an ILS and then making them actionable must also play a significant role here, though. As such, this statement of intent must surely be welcomed as a useful step towards making the value of the DOI more easily realised in a variety of contexts and through a range of interfaces.

I’m still not sure about the business model, though…

The release was originally brought to my attention by Peter Scott, and is highly relevant, given some of the possibilities to be explored at our next Research Day, being organised with BIC.

Technorati Tags: ,