« XTech Day 2 - Gavin Bell - 'What is your provenance?' | Main | XTech Day 2 - Alex Brown and Francis Cave on machine-readable Licensing »
16 May 2007
XTech Day 2 - Francis Cave on ACAP
Posted by Paul Miller at May 16, 2007 05:40 PM
ACAP came up on this blog before...
Now Francis Cave talks about it in the context of “Communicating access and usage permissions for online content.”
“The growth in the use of search engines and other aggregation services presents a major challenge to the businesses of traditional content owners. Newspaper publishers in particular rely upon aggregators to generate traffic to their online content, but have viewed with concern the growth in search engine advertising revenue, while their own revenues from the same sources have diminished. Many publishers wish to put content online for marketing purposes, but are put off so doing because they feel unable to control what use is made of that content.
A study during 2006 concluded that a major technical obstacle to the evolution of new commercial models is the lack of adequate standards for content owners to express content access and usage permissions in machine-readable form. Existing conventions, such as the Robots Exclusion Standard, cannot deliver sufficiently nuanced expressions of what aggregators’ systems should or should not do with online content.”
It's hard for anyone to make content available for access and use on the web without there being rules in place regarding the ways in which it may be accessed or used. By saying nothing about reuse, many people infer that they may do anything they like. In saying something about reuse, organisations tend to provide complex and legally worded statements of terms and conditions, which are unreadable by both humans and machines.
ACAP (Automated Content Access Protocol) intends to make rules of use and reuse machine readable and interpretable.
Search engines are hugely valuable to their users... and to owners of content. There are a multitude of positive business relationships between search engines and content owners, and/but the power and influence of those search engines has grown exponentially. There are well-publicised examples (eg Google' book digitisation) of conflict between content owners (publishers) and the search engines, around differential interpretations of fair use/dealing.
Content owners formed a task force in January 2006 to consider issues; they want and need search engine traffic, but also wish to control 'misuse' of their content by explicitly declaring usage permissions.
Current use of Robots Exclusion Protocol is a blunt instrument; access is simply permitted or denied. There is no scope for expressing conditionality, and there is actually no requirement for a crawler to support or respect the protocol.
Through something like ACAP, publishers will be better able to express and enforce usage restrictions, which should lead to greater availability of content; publishers will expose content that they currently keep locked away, because they'll be more confident that the data is protected.
ACAP funded by newspaper publishers, European Publishers Council and International Publishers Association, and includes publisher participants such as Wiley and Elsevier, as well as involvement from major [unnamed] search engines and the British Library. Other members involved include the Motion Picture Association and OPSI.
ACAP currently implementing a pilot project, which runs until the end of 2007. The project will produce a “standardized framework for machine readable expression of permissions for access and use”, providing a “proof of concept through pilot implementations”, resulting in a “sustainable business plan for future management” of the protocol.
The pilot project's scope includes openly published material on the web (eg newspapers and magazines), as well as content currently only available in closed access databases (scholarly journals, etc).
Technically, the project should comprise a mechanism for identifying and authenticating crawlers, as well as agreement on a standard set of 'usage verbs' (crawl, index, archive, preserve, derie, display, embed...), some measure of qualification (quantity, duration, attribution, location, payment, registration...), and a set of scopes (particular file types, specific resource classes, etc) and 'pushed' action requests (refresh the information you have, expunge content from an index, blacklist a crawler, etc) that may need to be transmitted.
Permissions should be capable of transmission via a number of existing protocols and formats, including an extended Robots Exclusion Protocol, NewsML, ONIX, etc.
Project now starting to define standard usages, reconciling current established practice in REP with commonly agreed semantics used elsewhere.
Developing guidance on crawler authentication and discovery; how do you verify the identity of a 'known' crawler, and how do you discover the provenance of a new one?
Technorati Tags: ACAP, Linked Data, open data, Talis, xtech, XTech2007
Trackback Pings
TrackBack URL for this entry:
http://blogs.talis.com/mt/mt-tb.r280.cgi/884


