Talis talks with Herbert van de Sompel about SFX, OAI, and Repositories
In our latest podcast I talk with Herbert van de Sompel of the Los Alamos National Laboratory in New Mexico. We discuss Herbert’s pivotal role in the development of SFX and the Open Archives Initiative (OAI), before turning to a broader discussion of issues related to the use of repositories in preserving and providing access to scholarly literature and data.
During the conversation, we refer to the following resources;
- American Chemical Society
- arxiv
- British Library
- Cornell University
- Creative Commons
- Crossref
- DOI
- Dspace
- EBSCO
- Elsevier
- Ex Libris
- Fedora
- Flickr
- Freebase
- Ghent University Library
- Grey Literature
- Stevan Harnad
- Pete Johnston and Andy Powell podcast
- Kevin Kelly’s TED Talk
- Library of Congress
- Linked Data
- MySpace
- NISO
- Open Archives Initiative (OAI)
- OAI Object Reuse and Exchange (OAI ORE)
- OAI Protocol for Metadata Harvesting (OAI-PMH)
- OpenURL
- Pathways Project
- RDF
- SilverPlatter
- SFX
- YouTube
This conversation was conducted using Skype on Wednesday 3 September, recorded with Ecamm Network’s Call Recorder for Skype, and edited on a Mac with Garageband.
For other Talis podcasts in this Xiphos series, see here. To subscribe to updates from all of Talis’ podcast series, see here.



September 6th, 2008 at 3:08 pm
[...] on our Xiphos blog, I’ve just published a podcast conversation I had with Herbert van de Sompel earlier this [...]
September 8th, 2008 at 3:44 am
[...] has released a podcast of an interview with Herbert van de Sompel, Digital Library Researcher at the Research Library of [...]
October 20th, 2008 at 8:42 am
[...] spoke recently with OAI-ORE Executive member Herbert van de Sompel, and we discussed some of the rationale behind [...]
November 5th, 2008 at 11:51 am
THE PROBLEM IS GETTING THE TARGET CONTENT DEPOSITED
Good interview, and brilliant, thoughtful work by Herb. Some comments:
(1) The reason the Budapest Open Access Initiative (BOAI) focusses on peer-reviewed journal articles is because that is the content that is behind the toll-access barriers that are reducing research progress and impact. Hence merely depositing bibliographies in Institutional Repositories (IRs) rather than the full texts and merely pointing to the toll-based version does not solve the access/impact problem.
(2) The version of research journal articles that authors deposit in their IRs (the peer-reviewed, revised, accepted final draft) is not the version of record. The version of record is the publisher’s proprietary (toll-based) version. That is the version that needs permanent preservation, which is not an OA matter. Depositing them in an Archival Deposit Library solves the preservation problem, but not the access/impact (OA) problem.
(3) Yes, prepublication preprints and research data are important and welcome in IRs too, but again, the content that is behind the access-denying toll-barriers are not preprints and data but published journal articles. Nothing is preventing researchers from depositing their preprints and data too, if they wish to (many are reluctant to make either their unrefereed preprints or their not-yet-fully-mined data public), but all researchers publish what they publish, and it is the access/impact barriers to that published corpus that are BOAI’s primary target.
(4) Yes, the machine-based data-mining of the peer-reviewed research articles (as well as preprints and data) is important, but even more important and urgent is to make them accessible to all human researchers, and not just to those whose institutions can afford toll-access. That is the OA problem.
(5) Yes, institutional and central repositories are completely equivalent technically (thanks to Herb, OAI-interoperability and ORE), but that is of no use if they are empty! Hence the reason it matters whether content is deposited institutionally or centrally is that institutions are the research-providers, covering all of research space, with a strong stake in the visibility, accessibility, usage and impact of their own research output. Institutions are also in a position to mandate that their own research output is deposited in their own IRs. “Disciplines,” in contrast, have no interests, and they cannot mandate. Hence the optimal way to generate the missing OA content is for both institutions and funders to mandate deposit in the author’s IR. Then central repositories as well as google and other harvesters can harvest the contents. That is what Southampton has been advocating (and doing) for years now, and that is what Harvard and NIH are now at long last getting round to doing.
November 13th, 2008 at 2:16 pm
[...] the people who developed all this are so damn smart, such as Herbert van de Sompel (I listened to a Talis podcast interview with him recently, interesting but would love to know more about the BL thing!) and many at UKOLN [...]
January 27th, 2009 at 10:13 am
[...] Herbert van de Sompel podcast [...]