« April 2007 | Main | June 2007 »

31 May 2007

Peter Murray-Rust Talks with Talis about Open Access, Open Data, Science, and the Semantic Web

505441878 A4B860Df3C M 100X150.Shkl

In our latest Talking with Talis podcast, I talk with Professor Peter Murray-Rust of the Unilever Centre for Molecular Sciences Informatics at the University of Cambridge.

In a wide-ranging conversation, we look at the changing nature of academic publishing, the importance of primary data to the process, and the remarkable potential of the Semantic Web in both streamlining and enriching the endeavour.



Listen Now | Download MP3 [70 mins, 48 Mb]

During the conversation, we refer to the following resources;

This conversation was conducted as a SkypeOut call on Thursday 31 May, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband and Audacity.

The picture of Peter Murray-Rust was taken by Gavin Bell, and is shared on Flickr under a Creative Commons License.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , , , , , , , , , ,

Posted by Paul Miller at 09:48 PM | Comments (0) | TrackBack

28 May 2007

Jamie Taylor Talks with Talis about Metaweb and Freebase

#9202A8C04000641F8000000004515949
In our latest Talking with Talis podcast, I talk with Semantic Web startup Metaweb's Minister of Information, Jamie Taylor. Best known for Freebase which, although only in limited alpha release, has already attracted the admiration of such commentators as Tim O'Reilly, Jon Udell, and OpenBusiness, Metaweb has the goal of 'build[ing] a better infrastructure for the Web.' Freebase's online help describes Metaweb's vision thus;

“We aspire to be the center of a new Web by being a critical piece of infrastructure for businesses, organizations and people that want to use, present and manipulate information. We've started by building a working system and a single example application. Over the next few months, we will work to demonstrate our larger vision by integrating Metaweb into other applications, websites, blogs... anywhere there is a need for structured information. Right now only a few thousand websites are 'database driven'. If we succeed, millions will be.

We will be compared to Google, Yahoo!, Wikipedia and other information driven sites, but such a comparison misses our real purpose: we are trying to enable thousands of organizations and millions of people to build their own Web. Paradoxically, we are decentralizing the experience of information by centralizing the storage of it.”

During our conversation, we discuss Freebase, Metaweb's wider goals, and the way in which both fit within the wider notion of a Semantic - or Data - Web.



Listen Now | Download MP3 [65 mins, 45 Mb]

During the conversation, we refer to the following resources;

This conversation was conducted over Skype on Friday 25 May, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , , , , , , , , ,

Posted by Paul Miller at 07:53 PM | Comments (0) | TrackBack

25 May 2007

Amit Kothari talks with Talis about QuotationsBook.com

Prince Small 100X135.Shkl

In our latest Talking with Talis podcast, I talk with Amit Kothari of QuotationsBook.com.

During our conversation, we explore some of the background to QuotationsBook and touch upon Amit's plans for the future. We also address the recent Creative Commons-licensed release of QuotationsBook data in RDF, and investigate some of the risks and opportunities posed by this open approach to data sharing.



Listen Now | Download MP3 [61 mins, 42 Mb]

During the conversation, we refer to the following resources;

This conversation was conducted over Skype on Thursday 24 May, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , , , , , , , , , , ,

Posted by Paul Miller at 10:49 AM | Comments (2) | TrackBack

22 May 2007

Linked Data - the real Semantic Web ?

4939188 E39A32Df0E M

It has been interesting to follow the rise of the 'Linked Data' meme in the Semantic Web community recently, and to track it alongside longer term (but quieter) mutterings around 'Open Data' from the likes of Tim O'Reilly and XTech programme committees past and present.

The recent push is due in no small part, I believe, to the sterling efforts of the Linking Open Data community, and to the support they've been receiving from W3C's Semantic Web Education & Outreach (SWEO) group, of which I'm a rather quiet member.

Listening to Tim Berners-Lee's keynote in Banff a week or so back, there was a strong steer toward 'Linked Data', and the opportunities presented by the relationships between resources and the aggregate of those resources. This thread came up again and again, most notably in the Linked/Open Data sessions. Thinking about it again, the whole Linked Data thrust actually comes across as a far more compelling way to describe the value of the Semantic Web to the non-geek audience. Are we seeing some formal shift in W3C's language as we and they grapple to clearly express the value of these misunderstood 'new' approaches? Let's hope so, as these Data Web/ Web of Data stories get far less bogged down in the horrors of 'triples', 'ontologies' and other concepts designed to send most audiences into an irretrievable tailspin...

If the Web of Data is the target, of course, the thorny issue of to whom the data belong, and the ways in which the data may be used, come to the fore once more. This is an area we've been tackling with contributions such as the Talis Community License, and it came up in Rob's contribution in Banff [Rob's audio here, PDF of everyone's slides here], as well as papers from both of us at XTech last week. We've seen a lot of interest in some of the issues we've been stressing around the need to apply some licence to data, and the importance of understanding the rights that do - and don't - apply to data as opposed to creative works, and look forward to finishing the work we started with the TCL and getting the whole thing onto some more formal footing.

One conversation from last week that has carried over onto email this week was with Rufus Pollock of the Open Knowledge Foundation. They don't have a license, but they do usefully define a set of principles to underpin the notion of 'open knowledge', and they explicitly include the separate notion of data;

“The Open Knowledge Definition (OKD) sets out principles to define the 'open' in open knowledge. The term knowledge is used broadly and it includes all forms of data, content such as music, films or books as well any other type of information.

In the simplest form the definition can be summed up in the statement that 'A piece of knowledge is open if you are free to use, reuse, and redistribute it'.”

We're seeing movement as a growing body of implementors, commentators and analysts recognise the potential of linking disparate data resources together, leveraging some of the more basic capabilities of RDF and other Semantic Web enabling technologies. We're also seeing a matching awareness of the need to protect use of those data sets (and not merely to safeguard the interests of data owners, but also - and far more tellingly - to give confidence to data aggregators and users), and a refreshing willingness to engage openly and cooperatively in reaching a pragmatic solution. It's a great time to be involved in this space, and Talis looks forward to playing our full part across the piece.

Today's picture, by 'jbum', and CC-licensed on Flickr, shows one example of that which is possible when data are open enough to allow uses not envisaged by their creator...

Update: Rufus Pollock has begun a Guide to Open Data Licensing on their wiki...

Technorati Tags: , , , , , , , ,

Posted by Paul Miller at 11:08 PM | Comments (0) | TrackBack

Semantic Technology

Ian Davis and Sam Tunnicliffe are over in San Jose for Semantic Technology, and what looks (from this Birmingham hotel room) like a great programme to build upon our experiences speaking earlier this month in Banff and Paris.

In related developments, Gartner have just published their latest exploration of the semantic technology/web space, and InfoWorld have some fresh coverage of the event.

Given their talk of suits and shoes, though, I'm worried that Ian and Sam won't fit in.

Technorati Tags: , , ,

Posted by Paul Miller at 09:56 PM | Comments (0) | TrackBack

21 May 2007

Synchronicity

Screenshot 4 100X233.Shkl
Isn't it weird, the way that the tools upon which we rely sometimes glitch in ways that should annoy, but that actually prove powerfully synergistic?

That happened today, for both Dave Errington and myself, when our automated patrollers of the blogosphere pointed to an insightful, incisive and timely post to his personal blog by our very own Chief Strategy Officer, Justin Leavesley.

The date on this post, to which our robotic ensurers of current awareness were directing us? 26 November, 2006...

Given some of the things we've been thinking, doing and planning for just recently, I can only assume that Justin's blog is broken, and that he really posted it yesterday...

Either that, or 'the system' is way smarter than we give it credit for, and Big Brother really is watching, probably through one of those HAL-like red glowing eyes...

Don't look behind you...

Technorati Tags:

Posted by Paul Miller at 01:59 PM | Comments (0) | TrackBack

18 May 2007

XTech, Quakr

IMG_8604 (modified)

Yesterday I sat through an excellent session from three people, each giving a different aspect of how they've gone about building Quakr

From left, Peter Arbuthnott, Katie Portwin and David Sant have been building a 3D tour of our world using photos of Flickr, geo-tagging, and some very expensive proprietary hardware developed by the team.

At the moment they've got some really nice demos showing Quakr's potential, these are centered on Oxford; their home town.

Essentially, what the team have done is take photos using their specialised camera, recording seven aspects of positioning data. These are (from their site)

  1. Altitude (Are we standing on a mountain?)
  2. Latitude (How for north/south of the equator are we.)
  3. Longitude (How far east/west of the meridian are we.)
  4. Compass bearing (ie, N/S/E/W - which direction are we pointing the camera?)
  5. Tilt (ie, are we pointing it up at the sky a bit, or down at the ground a bit?)
  6. Orientation (is this photo portrait, landscape, somewhere wacky in between?
  7. Timestamp (good for knowing if this is day or night)

This allows them to position the photos accurately in 3D space.

Katie's explanations of the issues involved in dis-ambiguating tags, reconciling different definitions of 'tilt' and then working with "Image Jungles" those spots on the map where there is an over-abundance of photos, was incredibly clear and helped me clarify some of the work I'm doing with bibliographic data.

They're encountering the same kinds of problems we have - that metadata being recycled for uses other than the original purpose is hard to handle and often needs a lot a lot of best-guess cleaning.

In the interests of full-disclosure I have to let you know that we thought these guys were pretty cool before seeing them talk. Some of us had met them at last year's Xtech and we'd been out for dinner the night before. I ordered a dozen escargot and was appalled when both Dave and Katie dug in, but my fellow Talisians declined with such base comments as "I don't eat Mollusks".

Seriously though, what Peter, Katie and Dave are doing stacks up against the (much slicker, and substantially better funded) Photosynth from Microsoft Live Labs.

That these guys have done this as a spare time project is ******* awesome.

Posted by Rob Styles at 12:01 PM | Comments (0) | TrackBack

17 May 2007

XTech Day 3 - Rufus Pollock and Jo Walsh talk about 'Atomisation and Open Data'

Rufus Pollock and Jo Walsh are talking about 'Atomisation and Open Data;'

“Atomisation on a large scale (such as in the Debian ‘apt’ packaging system) has allowed large software projects to be amazingly productive through their use of a decentralised, collaborative, incremental development process. Atomisation works so well because it allows us to ‘divide and conquer’ the organizational and conceptual problems of highly complex systems.

But what other kinds of information can be atomised? What are the possibilities and problems of this approach for forms of information other than software? How do we best design data APIs, discover and distribute existing resources, and recombine decentralised datasets?

Drawing on examples from geodata to Shakespeare we’ll demonstrate how atomisation is key to unlocking the potential of open data as well as how we can best begin to apply the lessons of open source to the world of open data.”

To understand things, we can look at massive aggregations of data in order to extract meaning. But data is rarely made available very usefully...

Rufus is showing some great examples of the inferences and analyses that can be drawn from historical data sets... and explaining how difficult it was to get the data into a form he could use.

“The coolest thing to do with your data will be thought of by someone else”

One big database is not the way forward;

“The revolution will be decentralised”/ “Small pieces loosely joined”
“Production should be decentralised and federated”

Debian 'apt' demonstrates one way in which a community can cooperate on a larger problem, with individuals tackling small pieces in ways that allow the components to be joined together when required.

Introduce notion of a Knowledge API; tags are a crude form of the Knowledge API.

Human Genome Project one example of a big project with implicit Knowledge APIs to allow public access to the genomic data. In this case, the unique identifiers they apply to genes within the project's database.

“Debugging code is hard. Debugging data will be harder”

In Scholarly Publication, the Knowledge API is the well-established set of understood shared identifiers; DOIs, standard scientific terms, etc. However, it is extremely difficult to atomise smaller than the unit of the scholarly paper itself.

Very little evidence of real and effective linking between silos at the moment; social or technological barriers? Or do people simply not see a need to do it for real?

“We can at least wrap the data up... in a form suitable for automatable downloading”

Comprehensive Knowledge Archive Network offered as a model whereby anyone can upload, identify and describe a set of data.

Technorati Tags: , , , ,

Posted by Paul Miller at 04:24 PM | Comments (0) | TrackBack

16 May 2007

XTech Day 2 - Alex Brown and Francis Cave on machine-readable Licensing

499413941 E4Bab68169 M

Francis Cave is back in front of the podium, speaking with Alex Brown on “Electronic Licensing with XML and Web 2.0 Technology

“As more and more content is published electronically so the need for controlling access to it has risen. Early efforts in this field focused on copy-protection technologies (DRM), but a more enlightened approach emerges if instead content licenses can be agreed between parties and content then used according to that agreement.”

Growth in digital collections held by various libraries. Demands from members of those libraries to be able to use digital content in new ways; inclusion in course packs, building 'personal libraries', scholarly and academic reuse of the results, etc.

Digital collections in libraries (and elsewhere) governed by various - complex - licensing regimes. It is effectively impossible to manage and interpret these licenses, either for a single use or (more difficultly) in recombining disparate resources in new ways. The complexity and opacity of traditional licensing regimes have a detrimental impact upon use and reuse... except in those cases where people decide not to bother looking and just do what they like without reference to any licence (often illegally).

Electronic Resource Management Systems are increasingly being used to manage complex digital collections of disparate resources. There's an opportunity here to manage the licence terms alongside the content itself.

ONIX for Publication Licenses (ONIX-PL [pdf]); a standard data dictionary and XML (XSD) expression for complete publications licenses. Part of ONIX for Licensing Terms (ONIX-LT) family, and being developed by EDItEUR with funding from JISC's PALS2 programme and the PLS. DLF and NISO also involved.

Initial development of ONIX-PL Editor (OPLE) tool using server-side XForms in the guise of Orbeon Forms.

Photo by me.

Technorati Tags: , , , ,

Posted by Paul Miller at 05:44 PM | Comments (0) | TrackBack

XTech Day 2 - Francis Cave on ACAP

500858650 19835A80F8 M

ACAP came up on this blog before...

Now Francis Cave talks about it in the context of “Communicating access and usage permissions for online content.”

“The growth in the use of search engines and other aggregation services presents a major challenge to the businesses of traditional content owners. Newspaper publishers in particular rely upon aggregators to generate traffic to their online content, but have viewed with concern the growth in search engine advertising revenue, while their own revenues from the same sources have diminished. Many publishers wish to put content online for marketing purposes, but are put off so doing because they feel unable to control what use is made of that content.

A study during 2006 concluded that a major technical obstacle to the evolution of new commercial models is the lack of adequate standards for content owners to express content access and usage permissions in machine-readable form. Existing conventions, such as the Robots Exclusion Standard, cannot deliver sufficiently nuanced expressions of what aggregators’ systems should or should not do with online content.”

It's hard for anyone to make content available for access and use on the web without there being rules in place regarding the ways in which it may be accessed or used. By saying nothing about reuse, many people infer that they may do anything they like. In saying something about reuse, organisations tend to provide complex and legally worded statements of terms and conditions, which are unreadable by both humans and machines.

ACAP (Automated Content Access Protocol) intends to make rules of use and reuse machine readable and interpretable.

Search engines are hugely valuable to their users... and to owners of content. There are a multitude of positive business relationships between search engines and content owners, and/but the power and influence of those search engines has grown exponentially. There are well-publicised examples (eg Google' book digitisation) of conflict between content owners (publishers) and the search engines, around differential interpretations of fair use/dealing.

Content owners formed a task force in January 2006 to consider issues; they want and need search engine traffic, but also wish to control 'misuse' of their content by explicitly declaring usage permissions.

Current use of Robots Exclusion Protocol is a blunt instrument; access is simply permitted or denied. There is no scope for expressing conditionality, and there is actually no requirement for a crawler to support or respect the protocol.

Through something like ACAP, publishers will be better able to express and enforce usage restrictions, which should lead to greater availability of content; publishers will expose content that they currently keep locked away, because they'll be more confident that the data is protected.

ACAP funded by newspaper publishers, European Publishers Council and International Publishers Association, and includes publisher participants such as Wiley and Elsevier, as well as involvement from major [unnamed] search engines and the British Library. Other members involved include the Motion Picture Association and OPSI.

ACAP currently implementing a pilot project, which runs until the end of 2007. The project will produce a “standardized framework for machine readable expression of permissions for access and use”, providing a “proof of concept through pilot implementations”, resulting in a “sustainable business plan for future management” of the protocol.

The pilot project's scope includes openly published material on the web (eg newspapers and magazines), as well as content currently only available in closed access databases (scholarly journals, etc).

Technically, the project should comprise a mechanism for identifying and authenticating crawlers, as well as agreement on a standard set of 'usage verbs' (crawl, index, archive, preserve, derie, display, embed...), some measure of qualification (quantity, duration, attribution, location, payment, registration...), and a set of scopes (particular file types, specific resource classes, etc) and 'pushed' action requests (refresh the information you have, expunge content from an index, blacklist a crawler, etc) that may need to be transmitted.

Permissions should be capable of transmission via a number of existing protocols and formats, including an extended Robots Exclusion Protocol, NewsML, ONIX, etc.

Project now starting to define standard usages, reconciling current established practice in REP with commonly agreed semantics used elsewhere.

Developing guidance on crawler authentication and discovery; how do you verify the identity of a 'known' crawler, and how do you discover the provenance of a new one?

Picture by Rob

Technorati Tags: , , , , ,

Posted by Paul Miller at 05:40 PM | Comments (0) | TrackBack

XTech Day 2 - Gavin Bell - 'What is your provenance?'

499952816 58257143C9 M

Gavin Bell from Nature is back on the stage, following his involvement in last night's BOF.

“Who you are matters, but maybe less than you think. Dunbar and his critics have pointed out only 150 people matter at most to anyone, at any point. Yet we focus on people all the time: Google makes use of this temporary focus in the pagerank algorithm. Social networking sites offer a domain specific set of links and metadata, which allow reliable discovery and tracking tools to be created, e.g. flickr pictures of snow.

Based on scientific communities, this talk will explore how people can act as navigational tools to allow interdisciplinary navigation, signposting the way from astro-physics to xeno-biology. We are comfortable with tags as navigational devices, but a tag means something to me, not to you. Knowing the person gives you the context to understand the meaning of the tag.

We have a word for it already 'provenance', no antique dealer will buy something without knowing the provenance, should we care as much online? What makes good provenance and how can we make it subject specific?

Looking at technologies such as XFN, FOAF, federated identity and Web 2.0 friends tagging and social networking services this talk will explore how we assess, form and make these navigational jumps and how the coming age of ubiquity will force us to face the potential fragmentation of the internet into tag soup, served cold.”

Looking at ideas of identity on the internet, building upon ideas in the day job around identity and its role in the scientific career.

Provenance is the history of an object; where it's from, who's handled it, when it was made, etc.

Provenance is increasingly important on the internet, as we spend more time online, and leave more information related to ourselves. The internet is really about people.

Most people prefer a reference from a friend to a result from a search.

Imagine ten years from now; what devices will we use? How will our identity be instantiated? How will we find people? Look at how this has changed, with the formalisation of surnames, the addition of postal addresses, the near ubiquity of telephone numbers, etc.

An individual like Gavin is 'identified' in a large number of (not always unique) ways; Flickr logins, blog URIs, IRC handles, Dopplr travel traces, etc.

Is OpenID the answer to identity consolidation? Not just about authentication as it might appear; allows for the representation of self.

Identity not just about self; the network of associations to our peers also important... although these are terribly fragmented online currently; my AIM buddies, my MSN contacts, my Twitter followers, my Skype contacts, my LinkedIn links, my Facebook friends, my Flickr friends, my Delicio.us bookmarks [if I had any] etc. There is some overlap, but I probably don't have all the 'identities' for all of my acquaintances, and it is difficult to move from one to the other effectively.

“People keep asking me to join the LinkedIn network, but I'm already part of a network; the Internet”
Jon Udell

Tags offer a means to begin linking various things together, introducing opportunities for serendipitous discovery.

“Can I sum the output of my friends to work out who knows what?”

“Can I manage people and not feeds?”

Sounds like the sort of thing that LinkedIn are doing with their Q&A function... although that's limited as it's locked inside their network, and only surfaced via one of their web pages.

Shows an example of identity consolidation, initially based upon himself and his various online presences, to scrape data from different sites in different ways to build a profile of himself, his online associations, and the linkages between each.

As well as the purposes for which Gavin intends this, I can see real value in the rather dull process of enabling me to find people with whom I am already connected via one service we both use, in order to connect with them on a second service that we both use but on which we have not discovered one another for whatever reason...

Photo by Rob Styles.

Technorati Tags: , , , ,

Posted by Paul Miller at 05:37 PM | Comments (0) | TrackBack

XTech, Adam Greenfield, Everyware

IMG_8527

Adam's on a book tour, for his book, Everyware : The Dawning Age of Ubiquitous Computing.

You can forgive him that as soon as he starts to speak, and because the book's been around a while. He is engaging and clear about the things he has seen happening and how they extrapolate into a future where the floor you walk on knows who you are. Think Minority Report.

With images of Bentham's Panopticon prison, the Tokyo subway system and many other insightful observations he convinces us that ubiquitous computing is happening now, all around us. Maybe we'd like to think about the design of that? The social implications; what happens to a society if every last stitch of hypocrisy is removed and everyone can know where anyone else is, or was, at any given time.

He talks about the need for plausible deniability in society, not in away to protect the seedier sides of life, but simply because as humans we need privacy.

To protect us from ourselves, or more likely each other, he suggests 5 laws for ubiquitous computing in a style reminiscent of Aasimov's 3 laws of robotics. Given his next example, though - that you can't walk from one place in Manhattan to another without being surveilled by CCTV - it may be harder than we would like to keep to these laws.

A few months ago I signed up for Garlik, and their CEO Tom Ilube recently podcasted with Paul Miller. The amount of information that Garlik found about me online was somewhat troubling, but the benefits of my Flickr account, my blog and online communities such as code4lib simply outweigh the risk; for now.

With networked computers, sensors, cameras and our own personal GPS, phone and other devices being increasingly omnipresent Adam discusses the subject objectively, but certainly not dispassionately.

Now I just have to read the book.

Posted by Rob Styles at 02:35 PM | Comments (0) | TrackBack

XTech Day 2 - Schuyler Erle on 'The Future of Geospatial Data - a two-way street?'

There is still no wifi at XTech.

Schuyler Erle from MetaCarta, wrote Mapping Hacks and Google Maps Hacks. Disappointed that Steve Coast was unable to make it...

maps tell stories - two dimensional narrative about place.

“maps provide a bounding box for a story” - Aaron Strauss Cope yesterday

everyobody has a story to tell, every story happens somewhere... and everybody should be able to tell their stories on a map.

but maps were hard.

but software can now encapsulate some of that expertise. eg Google Maps apis gave access to mapping capability to many...

However - you can't make a map without data

Traditional geodata collection - involved surveyors, aerial imagery, satellite imagery, etc.

Traditionally a one way street - professionals collected and compiled data, and then passed on/sold a finished product for constrained - licensed - use. Geographic data is owned.

UK-wide OS costs £100,000,000 pa; that is a barrier to innovation. OFT - £1bn pa detriment to UK plc. OS says £100bn pa Gross Value Add. So the data more than 'pays for itself' in VAT returns.

“the more people that have access to a bit of information, the more valuable that information is” (Ed Parsons, 'metcalfe's law applied to data')

Data is a 'non-rival good'

publicgeodata.org / freeourdata.org

US Public Sector Information market 5x size of that in the EU... - because it's free? Average age of US topographic map... about 25 years old. OS data more current. Gaps in tiger/line data... only collected to locate households for the census. No State remit to ensure coherence.

Therefore... is there an incentive for Benkler's 'commons-based peer production' to fill the gaps? [from a paper] More efficient at labour allocation; individuals can decide how to contribute. Not limited by market forces. Not limited by administrative overhead within an organisation.

Open StreetMap - map the world, and release the data under an open licence. Free. Up to date. 'Easily' corrected. Occasionally apocryphal. Gaping holes. Inconsistent. Always susceptible to improvement.

A means of self-correction? Checks and balances to monitor accidental or deliberate degradation. Community of interest. Architecture of participation. Tags used on attributes in database, but rather chaotic implementation. Now seeing documentation of shared views of the 'right' tags.

Is CC BY-SA right? Not just data v creative work... but implications for aggregated works...

Differential quality - will you ever be able to rely upon it? Accurate enough for what ?

'they who control the map, control reality'

Technorati Tags: , , , ,

Posted by Paul Miller at 11:03 AM | Comments (1) | TrackBack

Climate Change isn't about saving the planet

IMG_8535 (modified)

It's about saving ourselves.

This is the message that Gavin Starks is keynoting on at XTech. Climate Change is a phrase that hides the truth of the situation.

The Himalyan glaciers feed three of the world's major river systems, sustaining 750 million people. If these melt we're not talking about warmer weather - this is a Mass Extinction Event.

So Gavin's mission is to make the message clearer and to help people understand how they can avoid mass extinction; actually no that's not quite accurate. His mission is to help people

AVOID MASS EXTINCTION !

There, was that clear enough? Gavin is an enigmatic speaker, with a mix of images and statistical data he gives the usual doom and gloom story, using stronger language but the end result is more uplifting than the usual. He's doing stuff about this and wants to help other people do something too.

It's clear to Gavin that if we're talking about avoiding mass extinction then we shouldn't be concerned about IPR, Copyright or other barriers to sharing. We need to share everything we have, information, expertise, tools, data - everything.

So, today is launch day for AMEE (the Avoiding Mass Extinction Engine, http://www.dgen.net/amee) which is a carbon calculator based on peer-reviewed open data. Importantly it also provides an API and has a peer review process to accept contributions of new data.

Accepting new data, and making the data they have accessible via a simple API makes them more transparent, accountable and open than other similar efforts have been and that is a good thing.

Great talk, great tool - take a look at it.

Posted by Rob Styles at 10:13 AM | Comments (0) | TrackBack

15 May 2007

XTech Day 1 - Online distribution of scientific research BOF

499413907 Ebde349875 M

I'm in a discussion session at the end of Day 1 here at XTech, looking at issues related to the sharing of scientific research data;

“This BOF session will cover several themes important to those developing and promoting tools for scientific research, collaboration and publishing online.”

Gavin Bell and Alf Eaton from Nature are leading this session, and Alf starts off as they mean to continue; beyond Nature's boundaries, looking at good practice from PLoS ONE, including their use of standardised syntax (NLM XML), consistent feeds, DOI references, etc.

Alf points to the continuing distinction between HTML (and the forthcoming new version) and XHTML (and its new version), and suggests that the former remains better for 'normal' web content, whilst the structure behind the latter is more suitable for online scholarly publication.

Alf points to Postgenomic as one example of a site that enables participation in and tracking of conversations across the web; rather than requiring comments to be made on a single site in order for the conversation to be tracked. I have heard people in the past suggest that comments on blogs are actually counterproductive... and that those wishing to comment on content they find online should instead do it in their own space and link to the inspiration by means of trackbacks and similar techniques. It sounds as if Postgenomic may be part of that... ?

OTMI - “a way for publishers who don't want to give away the full text of a paper to make it available for indexing and searching”. Might that meet some of the (more reasonable) concerns of publishers with respect to the digitisation activities of Google et al?

“Avoid PDF” for scholarly publication - use XHTML and CSS 3 to look as good, but remain processable. Use rel tags to associate related data supporting arguments in the paper.

Open Data/ Collaboration - start by making tools to help people collaborate. eg shared cross-institutional repositories for distributed teams, capable of exposing particular data sets to a wider audience at the appropriate time.

How do you assess contributions, when those come in from blogs, on wikis, etc? Traditional model based upon peer-reviewed publication, but that's only part of the picture now. How do you track necessary elements of a contributor's identity, across all the different sites on which they might be active?

And now to the discussion session...

Peter Murray-Rust - “would like to promote the idea of complete publishing of scientific data in XML”. Not addressing business processes, but concentrating on technical elements; mathematical data, graphics (in SVG), geospatial data, chemical data all ripe for this approach... [Some of the stuff Internet Archaeology has been doing for ten years also relevant here...]. Can we agree consistent scientific units, and standardised ways to represent them in data? Royal Society of Chemistry - Project Prospect - taken ideas about markup of data, “bringing the chemistry to life” in the paper. Links to www.iupac.org etc.

Rufus Pollock - Economic Historian. 'Knowledge API' in the sense of identifiers; theyworkforyou... matching a system of Hansard transcripts with a system to allow voters to email their MP; struggled with the basic task of uniquely identifying individal MPs. “What's the unique identifier for UK population statistics in the 18th century?” Open Shakespeare project; uniquely identifying paragraphs in texts...

“How do we create unique identifiers that people can generate and share easily?” The Knowledge API.

Gavin Bell - pointing to the work he was involved with at the BBC, generating unique identifiers for radio and television programmes.

How do we cite and annotate online works effectively? Assigning unique ids to paragraphs, and allowing third parties to link directly to them, possibly the best we can do for now? Although, as scholarly text becomes more fluid, that method becomes problematic...

“Academics attract funding in order to be able to publish.”

Or do they publish in order to attract funding? ;-) The current model of reward and tenure is based upon where a scientist publishes, how often, and how regularly they are cited in similar publications. How can this model change to fit current realities?

Today's picture is mine!!! :-)

Technorati Tags: , , , ,

Posted by Paul Miller at 06:02 PM | Comments (1) | TrackBack

Ubiquitous Web: Alexandra Deschamps-Sonsino

Alexandra Deschamps-Sonsino of designswarm is presenting Ceci n'est pas seulement une pipe: semantic meaning of everyday objects in a connected world.

Objects have everyday meaning. The ubiquitous web can add a layer of complexity to those objects. Are we ready to deal with that? As a consumer of everyday life?

Stuff and Things + Technology

Otoizm: MP3 players that are also Yo-yos
Chairs that control the liggting as you sit on them

Stuff and Things + the internet

Webkinz
SecondLife
Mythings.com
Thinglink.org
Moodstats
Objects that help visualise
Stint
Nike + iPod

The trend for everyday objects to becme more aware of the online world and as we start to develop objects that represent the state of online things in the real world we blur the boundaries.

Right now we have the ability to tag objects, using barcodes and phones. There will come a time when the object simply radiates the information. Reference the physical indications of on and off as described in Adam Greenfields book Everyware: The Dawning Age of Ubiquitous Computing.

Wonderful examples of what next:

Who sat on this chair before?
For how long?
Is someone else sitting on a chair on the other side of the world?
How much do I weigh?

Product Design is going through a crisis as a result of many factors such as 'fabbing', the ability to home-fabricate; obviously the challenges brought by the ubiquitous web are seen both as threat and opportunity.

In an echo of Imity, Alexandra references a design project which resulted in bluetooth enabled fish, to allow Christians to find fellow believers.

In short she's advocating a multi-disciplinary approach to the development of the integration points between physical and virtual worlds. I whole-heartedly agree. That's why we're hiring for an Interaction Designer.

Posted by Rob Styles at 04:19 PM | Comments (0) | TrackBack

Ubiquitous Web: Aaron Strauss Cope

The Papernet, small pieces of paper loosely joined. Obviously a popular topic, the room has filled out more than previous sessions, is something Aaron has been writing on for over a year - but today he has a set of slides that "you could argue over while havng drinks".

Recipes. Recipe cards are hard to share, everyone has boxes of them, you try to copy other people's but it's hard and you never get all of them. Aaron thought maybe put them online! But he never wants to see a computer in the kitchen. Reading a recipe for Chocolate Cake in a text editor loses the magic.

So he needed a way to print cards with recipes on. He created a markup language. He wanted to use index cards, this is a terrible experience. There are no printers set up to do it. He says he'll come back to that.

Ubiquitous does not mean "Always On"

Due to power constraints, networking and all the rest, laptops are not always on. Paper, you can screw up, fold, tear and unwrap and it still works !

The revolution will not be convergence

What a great phrase. Use the internet for what it's good at. Use paper for what it's good at. Reading a book on a palm pilot is not a good experience. And in Paris, nobody's going to get their laptop out in a rainstorm.

Artifacts are the soft-porn of memory

Aaaron has a notebook guide of Barcelona, with his own notes in. The next person he knows who goes to Barcelona will get the notebook, so it can come back with more notes in it.

There's something more than online information. Everyone loves to receive a letter, a real one, not an email. The power goes off and there's still something there.

There is a limit to computer magic because human language is also magic and computers are still dumb

The web is not your desktop because you don't always have a connection, your battery runs out, your laptop gets stolen and so on.

<snip>a wander around what's broken about online data, google base, stickit and more</snip>

Aaron introduces a nice little guide printer, it takes stuff from stickit and other places, grabs maps and prints a booklet so you can carry it around. It also prints barcodes (QR codes) as well, so you can link the data back into your phone.

This post doesn't even come close to doing him justice. If you get the chance to see Aaron speak then do. Ditch any other session in favour of this guy.

Posted by Rob Styles at 03:52 PM | Comments (0) | TrackBack

Ubiquitous Web: Claus Dahl

Claus Dahl is one of the co-founders of Imity. Imity is a live service, with users (mostly in Copenhagen).

Imity is a little app you can run on your phone, turning your phone into a more context aware device. Imity uses Bluetooth to 'scan' your environment for other devices. Some we'll only know the name, some we'll have seen before. We can look them up on the web and learn more about them.

This appears to be an experiment in to a social network of devices. Users can tag objects, change the names, add notes to the devices. It also notifies you when it finds devices you registered to be notified of.

This also allows registered users to download details of devices, say a phone, owned by someone they know online, then they can be alerted when they meet that person in real-life. Nice twist in the way the online and offline worlds can be linked.

One of the things that Claus liked was Sascha Pohflepp's Buttons. A camera with no optics, that takes a timestamp then fetches a photo from Flickr taken at that time. Obviously not of the view you were looking at, just taken at the same time.

"I don't have to take any photos of this conference, someone else will do that for me"

The good old LazyWeb. Anyway, it seems to Claus we've been talking today about 3 kinds of ubiquity:

wire replacement
objects with agency
public data-space

These three types overlap, but different technologies lend themselves differently to different aspects.

The technology behind Imity is mostly server-side, with networking over GPRS. This has it's problems, but was the best of a number of difficult options.

Imity shows some really interesting characteristics. You don't have to operate it most of the time, it doesn't require clicking. Recording your environment builds up slowly, over time. It makes it hard to fake history, and means that "meaning arrives slowly". The lightweight simplicity and the difficulty in faking this makes it an interesting surrogate for identity. Claus believes the service is incredibly sticky.

The data, from around 500 users mostly in Copenhagen, shows incredibly interesting patterns in the relationships, showing how the subcultures overlap and intermingle.

The plan is to take recordings of presence of phones around the Roskilde Music Festival. Based on which stages people are watching, and which band is on they can provide recommendations, last.fm style, in the real-world.

Imity client is open-source on google code but as much happens server-side this might not be enormously useful. The intention is to re-factor to provide an API for tagging and mapping MAC addresses to URIs.

Claus finsihes with his personal perspective on provacy and security

Public space is a privacy problem.
Security is a social experience.
We can't possibly know the balance betweeen usefulness and riskiness yet.
I suspect there is not a technological fix.

This is a sensible position. I've always had Bluetooth discovery turned off. Will Imity persuade me to turn it back on?

Posted by Rob Styles at 02:28 PM | Comments (0) | TrackBack

Ubiquitous Web: Dave Raggett

I'm in a full day session of ubiquitous web presentations/discussions over at XTech in Paris, France.

It's kind of difficult to blog, as the wireless is non-existent! People keep running up their rooms to plug in and post stuff. Very 1996. I've scrounged a login to the wireless 'cause I'm still trying to prep my slides for Thursday am (late notice, got given a slot to talk about licensing). Anyways...

Dave Raggett is up first, essentially giving an explanation of what the day is about, for those not familiar with the term already.

Broadly, his introduction boils down to:

Moore's Law now applies to RF Circuitry. That is, it is increasingly possible to connect lots and lots of things to the network, at very low cost and this trend will continue.

Connectivity can be added to anything; home security devices, tv, heating and lighting equipment. There is a mix of networking technologies that help make this possible in different circumstanced - WiFi, Bluetooth, Infrared, Copper, Optical Fibre and Powerline networks. These are used both on large scales and domestically.

RFID chips also have come down to the point where we have RFID "dust".

The ubiquitous web also means that applications and devices can combine local and remote services. This is much the same as what our CTO, Justin, talks about as "Internet Inside" applications.

Getting everyone up-to-speed, Dave gives us simple examples like using your TV and Remote to control all kinds of household appliances. Essentially the market Microsoft and others have been playing for; the Home Hub.

Dave is chair of the Ubiquitous Web Applications WG at W3C, this group succeeds the Device Independence WG. It looks like it will be well worth following.

Defining UI in the ubiquitous web space, with the diverse number of possible appliances, should be done with XML + Events + RDF + Object Model. Dave pops up a couple of diagrams and talks about "Hidden Messaging" between devices. This appears to strike a nerve with Dave Beckett who suggests that this model is "Web Services", implying SOAP, and therefore flawed. Dave's point is that abstracting/encapsulating the underlying networking model prevents the application from handling service failures properly. He also suggests a RESTful approach would work better.

After conversations last week at WWW2007, I think I have to agree with him.

And on to the next speaker...

Posted by Rob Styles at 12:55 PM | Comments (1) | TrackBack

XTech Day 1 - the Ubiquitous Web

Paris-Shkl
I'm in Paris for the XTech conference, along with a good number of colleagues from Talis. Ian Davis speaks on Wednesday, and Rob and I are giving papers on Thursday. In unrelated news, we're sponsoring the conference too, and it's great to see people walking around with our new bags on their shoulders... I wonder how many will click through to the shiny new web site?

Continuing last week's theme of views from bedroom windows, I'm staying on the 31st floor of the conference hotel, with great views over La Défense, the Seine, and the prototype Statue of Liberty. I may have to invite Rob, his camera, and his Flickr account up to take a picture, with which to illustrate a later post!

Rather annoyingly, power and network are surprisingly scarce. I can see a (single) power socket, but would have to sit on the floor to use it. I can see two wireless networks with conference-related ssid's, but can't join them. A whispered explanation from Nad suggests that we're not allowed to use the network whilst speakers are presenting, to preserve limited bandwidth for their use. I really hope that's not true, and it's going to have an adverse effect upon my ability - and desire - to record the proceedings... and to read around the topics being presented in order to enrich my own understanding. When will conference organisers learn that technology conferences ought to provide attendees at their (expensive) events with power and bandwidth? Very disappointing.

That said, Timo Arnall's session on Physical Hyperlinks was very interesting, addressing some of the social obstacles to widespread deployment of various mobile device-readable links into the wild; it's technically feasible (presuming we don't run out of numbers), but people still feel uncomfortable using them in real-life environments such as the busy street. Nad has more detail, and Chris has a picture.

And just before lunch, Matt Biddulph took his audience down an interesting path, in which avatars sat atop 'real' molecules in Second Life in order to discuss their properties, and controllers in the physical world had an impact upon objects in virtual environments.

Image: “The small Statue of Liberty on the river Seine in Paris, France. The Statue of Liberty in New York is much larger. It faces west, towards its American sister.

Photographed by Adrian Pingstone in June 2002 and released to the public domain.

Technorati Tags: , , , ,

Posted by Paul Miller at 12:22 PM | Comments (0) | TrackBack

11 May 2007

WWW2007 - Linked Data once again

492503213 9Bdde12875 M

It's Friday morning here in Banff, and almost the end of the line for Rob and I as we jump on the bus for Calgary, a flight home, a weekend of washing, and then the trip to Paris for XTech.

But first, one more session on a topic that's been something of a theme for the conference since Tim Berners-Lee's keynote two days ago; a theme that looks likely to carry through to XTech in my paper and a whole track-full of others.

The topic? Linked Data, of course.

“Friday, May 11, 2007
Linked Data (Coleman, 10:30am-noon)

Session Chair: Danny Ayers (Independent)

* Tim Berners-Lee (W3C): Tabulator: A Semantic Web Browser (25 mins)
* Christian Bizer (Freie University Berlin): Querying Wikipedia Like a Database (25 mins)
* Tom Heath (KMi, The Open University): How to Combine the Best of Web2.0 and a Semantic Web: Examples from Revyu.com (25 mins)”

“Independent”? Sounds like Danny's standing for Parliament! Still, there is a vacancy coming up...

First up, Tim Berners-Lee shows Tabulator. He polls the room to start, and finds that about 50% of this packed room considers itself 'familiar' with the Semantic Web and RDF. Fewer have used or seen Tabulator.

Some of his presentation is here.

Tim demonstrates using Tabulator to view and navigate the relationships between nuggets of data stored in Linked Data-friendly repositories such as DBpedia. Interestingly - and importantly - Tabulator displays the provenance of the individual data assertions, backing up the point from his keynote that RDF triples are 'actually a quad'; with the fourth - provenance - being absolutely essential in building a trustworthy Data Web. This point came up several times in our session yesterday, too, as people grappled with issues of trust and authority in a linked network of assertions.

Tim “explains that the fundamental value is in RDF being a language for talking about data whereas XML is just a syntax for structuring documents” (thanks, Rob!)

Next up, Chris Bizer talks about DBpedia (PDF of a slightly older version of the presentation here). DBpedia is a community effort to extract structured information from Wikipedia, and make it available for linking across the web in RDF, under an open licence. The database currently contains 1,600,000 concepts, including almost 60,000 people, 70,000 places, etc. These concepts are described by 91,000,000 RDF triples, using more than 8,000 properties.

DBpedia is made available via a SPARQL endpoint, with a Linked Data interface that Semantic Web browsers such as Tabulator can interact with, and via various RDF data dumps that developers can take and implement for themselves. The DBpedia data is licensed with the GNU Free Documentation License. We really need to crack this licensing of data thing, because people aren't doing it right, and it's only going to get worse.

Chris points to the SWEO Linking Open Data project, suggesting that over 600,000,000 triples are already available via various activities of this project. He reckons there will be 30-40 billion triples within a few months, in DBpedia and in a wide range of other linked data projects capable of interoperating.

Freebase came up, and there was some discussion of the ways in which various versions of 'truth' can be meshed together between the growing number of large RDF data stores. This one is going to run and run, as various activities attempt to represent canonical views of a 'resource', and link it to facets drawn from a variety of third parties.

Finally, Tom Heath shows Revyu (presentation here). He's talking about merging the ease of participation offered by 'Web 2.0' notions, and aligning this with the Linking opportunities offered by Semantic Web data. Unlike many existing reviews sites, Revyu seeks to reach out across the web, and aggregate data from third parties rather than forcing reviewers to review on the site, where the reviews remain locked away. Revyu offers an easy interface to encourage creation of reviews, and exposes the resulting data as open RDF for machine linking and reuse. Currently, the intention is to Creative Commons-license the content (licensing is currently implicit rather than explicit), but Tom referred to Rob's presentation yesterday, and some of the issues we've grappled with in developing the Talis Community License.

In Q&A, Kingsley suggests examining Revyu.com data in an RDF browser, rather than using the site's human interface.. For example, putting the URI for Tom's on-site profile into the OpenLink RDF browser's “Data Source URI” box.

The time really is right for the community to get on and build the Semantic Web - the Data Web - for real, by exposing open - linkable - data to the Web, by realising the realities of appropriately licensing these data (the 'Talis Community License' is our contribution to that particular debate), and by taking existing tools forward to a point at which they will do more good than harm when exposed to an audience that isn't dominated by academic researchers into the theoretical constructs behind the Semantic Web.

Now for lunch... and that bus.

The picture is another one of Rob Styles', showing the view from his bedroom window. Shunt your perspective two windows to the right to imagine the view I've had...

Technorati Tags: , , , , , ,

Posted by Paul Miller at 06:59 PM | Comments (3) | TrackBack

10 May 2007

Presentations from WWW2007 Open Data panel now online

As mentioned earlier this morning, I convened a panel to discuss a set of issues around the notion of 'Open Data'.

The position-setting pieces presented by panellists are now available for download as a PDF. Peter Murray-Rust delivered his piece over the web, and the jumping-off point to that can be found here.

Danny Ayers was recording the conversation, and