Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Author Archive

Open Data Licensing, An Unnatural Thought

The first steps of the Semantic Web are now a short distance behind us and some organisations are starting to pick up the pace. With more and more data coming online, marked up for linking and sharing in a web of data, perhaps it’s time to look again at the trade-off of different intellectual property rights.

Back in November of 2004 James Boyle published A Natural Experiment in the Financial Times. This piece sees him debating the merits of intellectual property rights over data with Thomas Hazlett and Richard Epstein. His primary thrust is that we should be making policy decisions in this area based on empirical data about the economic benefits one way or another. Something all three protagonists agree on.

Much has changed between 2004 and now, not least our understanding of how the web can affect the way we collaborate, share, communicate; it fundamentally affects the way we live. We chat, we blog, we Twitter, we Flickr and we Joost. Content flows from person to person in unprecedented ways and at unprecedented speeds. This changes the nature of the experiment that Boyle talks about.

If the database right were working, we would expect positive answers to three crucial questions. First, has the European database industry’s rate of growth increased since 1996, while the US database industry has languished? [...] Second, are the principal beneficiaries of the database right in Europe producing databases they would not have produced otherwise? [...] Third, [...] is the right promoting innovation and competition rather than stifling it?

Boyle’s first two questions centre around the creation of databases and his third, by his own admission, is difficult to measure. If one of our primary goals for the growth of the Internet is to have a web of data that can be linked and accessed across the globe we may be better served by assessing how companies might make data open.

Boyle asks for, and discusses, the empirical evidence of databases being created in the EU and US. The differences in numbers should provide insight into the economic ups and downs as the EU adopted a robust database right in 1996 while the US ruled against such protection in 1991. I am interested in how we expect the growth of data on the Semantic Web to differ in the two jurisdictions.

Boyle explains that the US Chamber of Commerce oppose the creation of a database right in the US

[The US Chamber of Commerce] believe that database providers can adequately protect themselves with contracts, technical means such as passwords, can rely on providing tied services and so on.

And therein lies the rub. Without appropriate protection of intellectual property we have only two extreme positions available: locked down with passwords and other technical means; or wide open and in the public-domain. Polarising the possibilities for data into these two extremes makes opening up an all or nothing decision for the creator of a database.

With only technical and contractual mechanisms for protecting data, creators of databases can only publish them in situations where the technical barriers can be maintained and contractual obligations can be enforced.

We don’t tolerate this with creative works, our photographs, our blog posts and so on. Why would we expect it to make sense for databases? Whether or not it makes sense comes down to whether or not it is beneficial to society. We allow Copyright in order to provide adequate remuneration to be collected by the creator of a work. We allow patents to allow the recovery of development costs for an invention. Which is database right more like?

Patent is a very broad monopoly. If I had a patent on the clock, a mechanical means of measuring the passing of time, nobody else would be able to make clocks. Copyright, on the other hand is much narrower only allowing me to protect the specific design of my clocks. This is where it can get confusing with databases. Database right in the EU is like Copyright. It is a monopoly, but only on that particular aggregation of the data. The underlying facts are still not protected and there is nothing to stop a second entrant from collecting them independently.

Richard Epstein points to this in his contribution

The question is why do databases fall outside [the general principle of copyright], when the costs of compilation are in many cases substantial for the initial party and trivial for anyone who receives judicial blessing to copy the base? In answering this question, it will not do to say, as the Supreme Court said in the well known decision in Feist Publications v. Rural Telephone Service, (1991) that these compilations are not “original” in the sense that it requires no thought to check the spelling of the entries and to put them all in alphabetical order. But that obvious point should be met with an equally obvious rejoinder. If it requires no thought or intelligence to put the information together, then why not ask the second entrant into the market to go through the same drudge work as the first.

This is exactly what we see happening with Open Street Map. Ordnance Survey in the UK have rights over the map data they have collected. The protection covers the collection of geospatial data that they have created, they are not granted a monopoly in geospatial data.

This leaves a special case of databases, those which are created at low cost as a by-product of normal business. Examples used in Boyle’s article are telephone numbers, television schedules and concert times. Boyle gives us the answer directly

the [European] court ruled that the mere running of a business which generates data does not count as “substantial investment” enough to trigger the database right.

This reminds me strongly of The Smell of Food and the Sound of Coins a folk tale in which a wise judge decides that a restaurateur may charge for the smell of food wafting from his restaurant, however the appropriate price is the sound of coins chinking together.

That a database right may not and should not apply in all cases, and that there is a requirement to restrict anti-competitive practices, does not necessarily extend to the conclusion that a right is not required.

It seems to me that much of the debate around intellectual property rights has focussed on how they are used to keep things closed. Having suggested earlier that we have only the abilities to keep databases locked away or in contrast open them completely, I’d like to consider what it might mean to have a database right for keeping things open.

In response to Thomas Hazlett’s contribution Boyle asks

How many databases are now created and maintained entirely “free” and thus escape commercial directories altogether? There are obviously many, both in the scientific and the consumer realm. One can no more omit these from consideration, than one can omit free software from the software market.

This strikes me as a great comparison to consider. Taking one of the most prevalent free software licenses, the Gnu Public License, what might that look like for data?

One of the primary functions of the GPL is that it enforces Copyleft – the requirement to license derivative, and even complimentary, works under an the same license. That is, any commercial software that makes use of GPL code must, under the terms of the license, also be released under the GPL. The viral nature of this license is possible only because of the backing of Copyright.

Without a database right communities have no mechanism to publish openly and still insist upon this kind of Share-Alike agreement.

Consider the impact of this for situations where you you might use the idea of promiscuous copying to maintain the availability of data. Promiscuous copying relies on two things, lots of copies being made and lots of copies being available. Without the necessary licensing in place there is no mechanism with which to compel those who have copies to make those available. Public Domain means, by definition, no restriction – that means I can lock it away again.

Copyleft is just one position along a spectrum where ‘locked away’ and ‘free as a bird’ sit at each end. What the web shows us is that other business models form crucial parts of the eco-system. Epstein picks up on the controlling aspect of Boyle’s argument:

They can control their list of subscribers; give them each passwords; charge them based on the amount of the information that is used, or some other agreed-upon formula; and require them not to sell or otherwise transfer the information to third parties without the consent of the data base owner.

Imagine if this were true of Copyright material on the web? It has been, and still is on the occasional site. But mostly copyright owners are starting to see the value of publishing content online and they are underpinning the delivery of that content to consumers with other business models. Without Copyright the types of business that could participate would be reduced.

Epstein goes on to say:

The contractual solution is surely preferable, because general publication will allow for use by others that may not offend the copyright law, but which will block the possibility of payment for the costly information that is supplied.

And again, the very heart of the matter. If we are to encourage those who have large databases to make them open, to post them on the Semantic Web, we must provide them with models and solutions that are preferable to technical barriers and restrictive contracts. Allowing them to pick their own position on the spectrum seems to me to be a necessity in that. You can see any form of protection in two lights. When Boyle says

They make inventors disclose their inventions when they might otherwise have kept them secret.

I say

They allow inventors to disclose their inventions when they might otherwise have had to keep them secret.

That’s why we’ve invested in a license to do this, properly, clearly and in a way that stays Open.

Rob Styles is Programme Manager for Data Services at Talis, a UK company building Semantic Web

technologies. Rob Styles is not a lawyer.

Technorati Tags: , ,

XTech, Quakr

IMG_8604 (modified)

Yesterday I sat through an excellent session from three people, each giving a different aspect of how they’ve gone about building Quakr

From left, Peter Arbuthnott, Katie Portwin and David Sant have been building a 3D tour of our world using photos of Flickr, geo-tagging, and some very expensive proprietary hardware developed by the team.

At the moment they’ve got some really nice demos showing Quakr’s potential, these are centered on Oxford; their home town.

Essentially, what the team have done is take photos using their specialised camera, recording seven aspects of positioning data. These are (from their site)

  1. Altitude (Are we standing on a mountain?)
  2. Latitude (How for north/south of the equator are we.)
  3. Longitude (How far east/west of the meridian are we.)
  4. Compass bearing (ie, N/S/E/W – which direction are we pointing the camera?)
  5. Tilt (ie, are we pointing it up at the sky a bit, or down at the ground a bit?)
  6. Orientation (is this photo portrait, landscape, somewhere wacky in between?
  7. Timestamp (good for knowing if this is day or night)

This allows them to position the photos accurately in 3D space.

Katie’s explanations of the issues involved in dis-ambiguating tags, reconciling different definitions of ’tilt’ and then working with “Image Jungles” those spots on the map where there is an over-abundance of photos, was incredibly clear and helped me clarify some of the work I’m doing with bibliographic data.

They’re encountering the same kinds of problems we have – that metadata being recycled for uses other than the original purpose is hard to handle and often needs a lot a lot of best-guess cleaning.

In the interests of full-disclosure I have to let you know that we thought these guys were pretty cool before seeing them talk. Some of us had met them at last year’s Xtech and we’d been out for dinner the night before. I ordered a dozen escargot and was appalled when both Dave and Katie dug in, but my fellow Talisians declined with such base comments as “I don’t eat Mollusks”.

Seriously though, what Peter, Katie and Dave are doing stacks up against the (much slicker, and substantially better funded) Photosynth from Microsoft Live Labs.

That these guys have done this as a spare time project is ******* awesome.

Technorati tags: , , , , , , ,

XTech, Adam Greenfield, Everyware

IMG_8527

Adam’s on a book tour, for his book, Everyware : The Dawning Age of Ubiquitous Computing.

You can forgive him that as soon as he starts to speak, and because the book’s been around a while. He is engaging and clear about the things he has seen happening and how they extrapolate into a future where the floor you walk on knows who you are. Think Minority Report.

With images of Bentham’s Panopticon prison, the Tokyo subway system and many other insightful observations he convinces us that ubiquitous computing is happening now, all around us. Maybe we’d like to think about the design of that? The social implications; what happens to a society if every last stitch of hypocrisy is removed and everyone can know where anyone else is, or was, at any given time.

He talks about the need for plausible deniability in society, not in away to protect the seedier sides of life, but simply because as humans we need privacy.

To protect us from ourselves, or more likely each other, he suggests 5 laws for ubiquitous computing in a style reminiscent of Aasimov’s 3 laws of robotics. Given his next example, though – that you can’t walk from one place in Manhattan to another without being surveilled by CCTV – it may be harder than we would like to keep to these laws.

A few months ago I signed up for Garlik, and their CEO Tom Ilube recently podcasted with Paul Miller. The amount of information that Garlik found about me online was somewhat troubling, but the benefits of my Flickr account, my blog and online communities such as code4lib simply outweigh the risk; for now.

With networked computers, sensors, cameras and our own personal GPS, phone and other devices being increasingly omnipresent Adam discusses the subject objectively, but certainly not dispassionately.

Now I just have to read the book.

Technorati tags: , , , ,

Climate Change isn’t about saving the planet

IMG_8535 (modified)

It’s about saving ourselves.

This is the message that Gavin Starks is keynoting on at XTech. Climate Change is a phrase that hides the truth of the situation.

The Himalyan glaciers feed three of the world’s major river systems, sustaining 750 million people. If these melt we’re not talking about warmer weather – this is a Mass Extinction Event.

So Gavin’s mission is to make the message clearer and to help people understand how they can avoid mass extinction; actually no that’s not quite accurate. His mission is to help people

AVOID MASS EXTINCTION !

There, was that clear enough? Gavin is an enigmatic speaker, with a mix of images and statistical data he gives the usual doom and gloom story, using stronger language but the end result is more uplifting than the usual. He’s doing stuff about this and wants to help other people do something too.

It’s clear to Gavin that if we’re talking about avoiding mass extinction then we shouldn’t be concerned about IPR, Copyright or other barriers to sharing. We need to share everything we have, information, expertise, tools, data – everything.

So, today is launch day for AMEE (the Avoiding Mass Extinction Engine, http://www.dgen.net/amee) which is a carbon calculator based on peer-reviewed open data. Importantly it also provides an API and has a peer review process to accept contributions of new data.

Accepting new data, and making the data they have accessible via a simple API makes them more transparent, accountable and open than other similar efforts have been and that is a good thing.

Great talk, great tool – take a look at it.

Technorati tags: , , , , , , ,

Ubiquitous Web: Alexandra Deschamps-Sonsino

Alexandra Deschamps-Sonsino of designswarm is presenting Ceci n’est pas seulement une pipe: semantic meaning of everyday objects in a connected world.

Objects have everyday meaning. The ubiquitous web can add a layer of complexity to those objects. Are we ready to deal with that? As a consumer of everyday life?

Stuff and Things + Technology

Otoizm: MP3 players that are also Yo-yos
Chairs that control the liggting as you sit on them

Stuff and Things + the internet

Webkinz
SecondLife
Mythings.com
Thinglink.org
Moodstats
Objects that help visualise
Stint
Nike + iPod

The trend for everyday objects to becme more aware of the online world and as we start to develop objects that represent the state of online things in the real world we blur the boundaries.

Right now we have the ability to tag objects, using barcodes and phones. There will come a time when the object simply radiates the information. Reference the physical indications of on and off as described in Adam Greenfields book Everyware: The Dawning Age of Ubiquitous Computing.

Wonderful examples of what next:

Who sat on this chair before?
For how long?
Is someone else sitting on a chair on the other side of the world?
How much do I weigh?

Product Design is going through a crisis as a result of many factors such as ‘fabbing’, the ability to home-fabricate; obviously the challenges brought by the ubiquitous web are seen both as threat and opportunity.

In an echo of Imity, Alexandra references a design project which resulted in bluetooth enabled fish, to allow Christians to find fellow believers.

In short she’s advocating a multi-disciplinary approach to the development of the integration points between physical and virtual worlds. I whole-heartedly agree. That’s why we’re hiring for an Interaction Designer.

Technorati tags: , , , ,

Ubiquitous Web: Aaron Strauss Cope

The Papernet, small pieces of paper loosely joined. Obviously a popular topic, the room has filled out more than previous sessions, is something Aaron has been writing on for over a year – but today he has a set of slides that “you could argue over while havng drinks”.

Recipes. Recipe cards are hard to share, everyone has boxes of them, you try to copy other people’s but it’s hard and you never get all of them. Aaron thought maybe put them online! But he never wants to see a computer in the kitchen. Reading a recipe for Chocolate Cake in a text editor loses the magic.

So he needed a way to print cards with recipes on. He created a markup language. He wanted to use index cards, this is a terrible experience. There are no printers set up to do it. He says he’ll come back to that.

Ubiquitous does not mean “Always On”

Due to power constraints, networking and all the rest, laptops are not always on. Paper, you can screw up, fold, tear and unwrap and it still works !

The revolution will not be convergence

What a great phrase. Use the internet for what it’s good at. Use paper for what it’s good at. Reading a book on a palm pilot is not a good experience. And in Paris, nobody’s going to get their laptop out in a rainstorm.

Artifacts are the soft-porn of memory

Aaaron has a notebook guide of Barcelona, with his own notes in. The next person he knows who goes to Barcelona will get the notebook, so it can come back with more notes in it.

There’s something more than online information. Everyone loves to receive a letter, a real one, not an email. The power goes off and there’s still something there.

There is a limit to computer magic because human language is also magic and computers are still dumb

The web is not your desktop because you don’t always have a connection, your battery runs out, your laptop gets stolen and so on.

<snip>a wander around what’s broken about online data, google base, stickit and more</snip>

Aaron introduces a nice little guide printer, it takes stuff from stickit and other places, grabs maps and prints a booklet so you can carry it around. It also prints barcodes (QR codes) as well, so you can link the data back into your phone.

This post doesn’t even come close to doing him justice. If you get the chance to see Aaron speak then do. Ditch any other session in favour of this guy.

Technorati tags: , , , ,

Ubiquitous Web: Claus Dahl

Claus Dahl is one of the co-founders of Imity. Imity is a live service, with users (mostly in Copenhagen).

Imity is a little app you can run on your phone, turning your phone into a more context aware device. Imity uses Bluetooth to ‘scan’ your environment for other devices. Some we’ll only know the name, some we’ll have seen before. We can look them up on the web and learn more about them.

This appears to be an experiment in to a social network of devices. Users can tag objects, change the names, add notes to the devices. It also notifies you when it finds devices you registered to be notified of.

This also allows registered users to download details of devices, say a phone, owned by someone they know online, then they can be alerted when they meet that person in real-life. Nice twist in the way the online and offline worlds can be linked.

One of the things that Claus liked was Sascha Pohflepp’s Buttons. A camera with no optics, that takes a timestamp then fetches a photo from Flickr taken at that time. Obviously not of the view you were looking at, just taken at the same time.

“I don’t have to take any photos of this conference, someone else will do that for me”

The good old LazyWeb. Anyway, it seems to Claus we’ve been talking today about 3 kinds of ubiquity:

wire replacement
objects with agency
public data-space

These three types overlap, but different technologies lend themselves differently to different aspects.

The technology behind Imity is mostly server-side, with networking over GPRS. This has it’s problems, but was the best of a number of difficult options.

Imity shows some really interesting characteristics. You don’t have to operate it most of the time, it doesn’t require clicking. Recording your environment builds up slowly, over time. It makes it hard to fake history, and means that “meaning arrives slowly”. The lightweight simplicity and the difficulty in faking this makes it an interesting surrogate for identity. Claus believes the service is incredibly sticky.

The data, from around 500 users mostly in Copenhagen, shows incredibly interesting patterns in the relationships, showing how the subcultures overlap and intermingle.

The plan is to take recordings of presence of phones around the Roskilde Music Festival. Based on which stages people are watching, and which band is on they can provide recommendations, last.fm style, in the real-world.

Imity client is open-source on google code but as much happens server-side this might not be enormously useful. The intention is to re-factor to provide an API for tagging and mapping MAC addresses to URIs.

Claus finsihes with his personal perspective on provacy and security

Public space is a privacy problem.
Security is a social experience.
We can’t possibly know the balance betweeen usefulness and riskiness yet.
I suspect there is not a technological fix.

This is a sensible position. I’ve always had Bluetooth discovery turned off. Will Imity persuade me to turn it back on?

Technorati tags: , , , ,

Ubiquitous Web: Dave Raggett

I’m in a full day session of ubiquitous web presentations/discussions over at XTech in Paris, France.

It’s kind of difficult to blog, as the wireless is non-existent! People keep running up their rooms to plug in and post stuff. Very 1996. I’ve scrounged a login to the wireless ’cause I’m still trying to prep my slides for Thursday am (late notice, got given a slot to talk about licensing). Anyways…

Dave Raggett is up first, essentially giving an explanation of what the day is about, for those not familiar with the term already.

Broadly, his introduction boils down to:

Moore’s Law now applies to RF Circuitry. That is, it is increasingly possible to connect lots and lots of things to the network, at very low cost and this trend will continue.

Connectivity can be added to anything; home security devices, tv, heating and lighting equipment. There is a mix of networking technologies that help make this possible in different circumstanced - WiFi, Bluetooth, Infrared, Copper, Optical Fibre and Powerline networks. These are used both on large scales and domestically.

RFID chips also have come down to the point where we have RFID “dust”.

The ubiquitous web also means that applications and devices can combine local and remote services. This is much the same as what our CTO, Justin, talks about as “Internet Inside” applications.

Getting everyone up-to-speed, Dave gives us simple examples like using your TV and Remote to control all kinds of household appliances. Essentially the market Microsoft and others have been playing for; the Home Hub.

Dave is chair of the Ubiquitous Web Applications WG at W3C, this group succeeds the Device Independence WG. It looks like it will be well worth following.

Defining UI in the ubiquitous web space, with the diverse number of possible appliances, should be done with XML + Events + RDF + Object Model. Dave pops up a couple of diagrams and talks about “Hidden Messaging” between devices. This appears to strike a nerve with Dave Beckett who suggests that this model is “Web Services”, implying SOAP, and therefore flawed. Dave’s point is that abstracting/encapsulating the underlying networking model prevents the application from handling service failures properly. He also suggests a RESTful approach would work better.

After conversations last week at WWW2007, I think I have to agree with him.

And on to the next speaker…

Technorati tags: , , ,

Data, MetaData and Content

I’m over in Banff, for WWW2007 next week with Paul, standing in for Ian Davis on a panel about Open Data. I figured I’d better jot some thoughts down before going so I don’t forget what to say…

Open Data is great. Lots of useful, valuable, interesting and useful data; just free as the air we breathe. Or is it?

There’s a discussion happening in the library sector about the specifics of catalog copyright and how that affects libraries’ ability to share information about their collections and some of the points are really interesting. I’ll blog some more about that soon.

What I wanted to talk about now is an aside to what I presented a few weeks back on the subject of Open Data, the web and sharing over at EUSIDIC in Copenhagen (video). I was congratulated by several audience members afterwards for giving a brave message. And I hadn’t said half of what I wanted to. One of the things I did say is that if selling data is not your core mission then you need to think about whether it helps or hinders you; and that depends on what kind of data you have.

First of all we have content; original creative work that is consumed for its own sake. A good book, the new album from your favourite artist, a beautiful painting or a blog you follow. We consume content for what it gives us directly.

Then we have data. We consume data typically by aggregating it and looking at the trends, mining it for information. We consume it for what it can tell us more-or-less directly.

Then we have metadata. Metadata is different from content and data because most of the time we don’t actually want it. We step over it, often not noticing, on our way to the content we really wanted. The track listing that gets us to the song, the catalogue that gets us to the book, the tag that helps us find the photo. All just stepping stones along the way to the thing we really wanted.

This is a worthwhile distinction to make as it helps us to understand how we might license things.

For content, we have Copyright law, and useful simplifications available to everyone through Creative Commons. Creative Commons is great because it gives everyone a simple, clear way to say “Some Rights Reserved” rather than leaving things as “All Rights Reserved”.

For data the situation is more complex. In Europe we have a Database Right which protects databases purely on the basis of the investment they took to create. There is no equivalent protection in the US. However, both countries have a notion of Compilation Copyright that protects the selection and arrangement of content; things like compilation CDs or collections of short stories are protected. Working out if your data is a compilation (which requires creativity in the act of selection) or a database, and protected by database right, seems to be quite tricky. The Ordnance Survey came under scrutiny recently when Charlotte Waelde reported that geospatial data may not be protected by copyright at all.

But what of metadata? In some cases it will be possible to protect it using Database Right, in some cases it won’t. But that’s not the important decision. The trend right now seems to be that metadata, the data needed to get where you want to go, is becoming more open, more quickly than the other two.

In many cases this is because the place you want to go get to is where the business model is; you don’t pay to search iTunes, you pay for the tunes once you’ve found them. Where businesses have built on revenue models charging for access to metadata communities are bypassing them and building their own repositories: FreeDB, Open Street Map, ISBNdb. That means that if you have a pile of metadata you might want to think about how you can give it away rather than how you can keep it locked away.

Giving it away doesn’t mean leaving it unprotected though. What Creative Commons, and software licenses like the GPL have shown us is that protection of content, data and metadata is as important to keeping it open and free as people think it is for keeping it closed. That’s why I’m hoping to write some more on licensing shortly.

 

Technorati tags: , , , , , , , , ,