« XTech Day 2 - Alex Brown and Francis Cave on machine-readable Licensing | Main | XTech, Quakr »
17 May 2007
XTech Day 3 - Rufus Pollock and Jo Walsh talk about 'Atomisation and Open Data'
Posted by Paul Miller at May 17, 2007 04:24 PM
Rufus Pollock and Jo Walsh are talking about 'Atomisation and Open Data;'
“Atomisation on a large scale (such as in the Debian ‘apt’ packaging system) has allowed large software projects to be amazingly productive through their use of a decentralised, collaborative, incremental development process. Atomisation works so well because it allows us to ‘divide and conquer’ the organizational and conceptual problems of highly complex systems.
But what other kinds of information can be atomised? What are the possibilities and problems of this approach for forms of information other than software? How do we best design data APIs, discover and distribute existing resources, and recombine decentralised datasets?
Drawing on examples from geodata to Shakespeare we’ll demonstrate how atomisation is key to unlocking the potential of open data as well as how we can best begin to apply the lessons of open source to the world of open data.”
To understand things, we can look at massive aggregations of data in order to extract meaning. But data is rarely made available very usefully...
Rufus is showing some great examples of the inferences and analyses that can be drawn from historical data sets... and explaining how difficult it was to get the data into a form he could use.
“The coolest thing to do with your data will be thought of by someone else”
One big database is not the way forward;
“The revolution will be decentralised”/ “Small pieces loosely joined”
“Production should be decentralised and federated”
Debian 'apt' demonstrates one way in which a community can cooperate on a larger problem, with individuals tackling small pieces in ways that allow the components to be joined together when required.
Introduce notion of a Knowledge API; tags are a crude form of the Knowledge API.
Human Genome Project one example of a big project with implicit Knowledge APIs to allow public access to the genomic data. In this case, the unique identifiers they apply to genes within the project's database.
“Debugging code is hard. Debugging data will be harder”
In Scholarly Publication, the Knowledge API is the well-established set of understood shared identifiers; DOIs, standard scientific terms, etc. However, it is extremely difficult to atomise smaller than the unit of the scholarly paper itself.
Very little evidence of real and effective linking between silos at the moment; social or technological barriers? Or do people simply not see a need to do it for real?
“We can at least wrap the data up... in a form suitable for automatable downloading”
Comprehensive Knowledge Archive Network offered as a model whereby anyone can upload, identify and describe a set of data.
Technorati Tags: Linked Data, open data, Talis, xtech, XTech2007
Trackback Pings
TrackBack URL for this entry:
http://blogs.talis.com/mt/mt-tb.r280.cgi/887

