Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Archive for the 'Welcome to the Webs' Category

Twitter metadata—metaphor?

Snow near us.
Image by Zach_Beauvais via Flickr

I’m sure I’m introducing old friends; but Twitter is a “microbloggiing” platform, to give it its proper description. It gives users 140 characters to publish status updates, comments, gripes, complaints, praises, news and whatever comes to mind. It’s burst out of its original answer to the simple question: “What are you doing?” and users often tweet just about everything.

One interesting innovation is the integration of the hashtag: simply a hash symbol (#) and a tag descriptor for the comment. This gives people the ability to follow particular threads of updates or participate in conversations around an interest. They’re often used, for example, to update the goings on from conferences (#FOWA for example). People give their own content this little bit of information, and a search engine can find them. People can add additional information and follow conventions which allow for distributed trends that anyone can follow and interact with.

The recent snowfall in Britain gave rise to a flurry of tweets about road closures, amounts of snow falling, schools closing down and all the other chaos unleashed. When users followed a simple convention, however, this information got organised. People quickly adopted the #uksnow hashtag to track the topic; and eventually someone worked out a way to capture all the info needed to follow these geographically. By tweeting the first half of a UK post code plus a rating out of ten snow falling, anyone following the thread knows exactly where it’s snowing and how much is coming down. It’s like an instant weather polling station, distributed across the country. It can go a step further, however, when services can actually mashup these tweets when users turn their simple status updates into a mini line of code.

This little bit of information allows for people to write software to track and automate the twitter information. This interactive map from benmarsh.co.uk, for example, actually plots a visual graph of snowfall across Britain. Bigger snowflakes indicate larger numbers out of ten in the poll. It’s simple, really. Ingenious, possibly. But the fundamental distinction between this tracking ability and the noise of thousands of Twits shouting about the snow is that little bit of #metadata.

So, is this use of twitter a metaphor for the Semantic Web? It’s certainly a picture of automating information flow using metadata via software. Sounds Semanticcy to me.

 

 

Enhanced by Zemanta

The Incomplete Web

In a short blog post – The Incomplete Web – over on O’Reilly’s net, Michael Hausenblas has just provided a wonderful little analogy (for programmers at least) regarding the motivation behind the “Web of Data (The-Thing-Formerly-Known-As-The-Semantic-Web)”. A must-read for anyone involved in building Web apps.

To date I’ve resisted the temptation to post here the personal-opinion and cat photo kind of stuff I’d normally post to my own blog (which I’m currently reorganizing). But I reckon Michael’s post justifies the exception.

“State of the Semantic Web” – personal opinions?

While I’m suffering from work backlog and external distractions, it occurred to me this would be an excellent juncture for a review of the current situation, so am planning a write-up on the topic over here sometime in the next week or two. I’ve mailed a few relevant lists (original post has most details, though it’s since been pointed out that there I expose my own HTTP+RDF-bias, so feel free to ignore everything but the title of this post – I do want to be as objective as possible). Please mail me if you have any thoughts, – no matter how tentative, personal, biased and/or negative (but please be clear on what I can/can’t quote you directly on). Thanks.

Drupal calling Semantic Web..!

Arto Bendiken just posted a really useful mail to the Semantic Web Education and Outreach group giving some background on RDF developments around Drupal, as well as a list of possible ways SWEO could help. The list makes interesting reading for anyone looking to evangelize to developers, here’s a minimal summary:

  1. RDF myths debunked – Arto mentions the legacy of early RDF/XML experience, suggests promoting Turtle and RDFa “the ultimate microformat” (the SW FAQ may help here)
  2. External validation – convincing the Drupal community-at-large that Drupal 7.0 adopting RDF wouldn’t be taking place in a vacuum (nice hat-tip to Talis, thanks!)
  3. Endorsement and adoption – “Tim BL blogs using Drupal”
  4. Mentorship and participation – input from Semantic Web folks into the Drupal community
  5. RDF Schema for Drupal – immediate action item that could benefit from the RDF expertise

(The recommended tutorials material quoted in the mail, on the ESW Wiki and on Engage is on my to-do list, I hope to get back to that this week).

Data Licenses for Social Network Services

I mailed the note below to the DataPortability list last week, and as the response was quite positive I’ll hazard it’ll be worthwhile posting over here too. It’s closely related to Paul’s recent posts about shenanigans and privacy in the cloud, and as I mention in the mail, Talis played a key role in the Open Data Commons license, so this does seem like a good place.

More crucially to me, Chris Saad, who’s been doing a grand job of pulling things together for the DataPortability group, suggested I take it another step and post an outline toward the DataPortability Policy Reference Design. For that, I really could do with some feedback on what I’ve got so far… (some postscripts below).

~~~~~

I think the good Mr. Scoble’s recent experience with Facebook offers some good pointers to areas which need work. One which seems prominent is data ownership and licensing. Here are some sketchy thoughts on the matter.

The starting point I would suggest is “I own my data“. This would correspond more or less to the default copyright on documents -
even if you don’t say anything explicit on something you write, you have the copyright.

What happens when we sign up for a service is we allow that party access to (some parts of) our own data, currently usually by filling in forms. Wen
we connect to friends within social networking systems is we allow them access to (some parts of) our own data. In both cases this seems an implicit
licensing of that data for subsequent use. However, not everyone sees things that way.

Dare Obasanjo draws
a distinction
between information exposed on the service’s web pages and that exposed through the API. While the quality of the data may differ
significantly, I’d suggest that in terms of licensing this distinction is bogus. If I can copy & paste from one app to another, that can have the net end result as scraping. As Paul Downey put it, good web APIs are just web sites.

A more extreme view can be found in a comment
on Scoble’s blog “…you stole my personal details…“. While this seems a kneejerk reaction, it’s clear how such a perception could arise.

Right now the service providers generally allow connection with a vague “he my friend“, and bury any details deep within their Terms of
Service. But if the terms of the connection were made explicit, not only for signup with the service, but with every connection event, any ambiguity would be removed. Hence:

“Robert is my friend…I’d like to grant him access to my data”

Which leads onto the question of what form such a license might take.

Many of the options are already visible in copyright and software licenses, though I don’t believe (m)any are directly suitable for use with data. The difficulties arrive with data derived from the original data – along the lines of software extension and modification, but twistier (e.g. attention profiles which might only contain derived data, but couldn’t exist without the original).

Anyhow, possible examples would be:

  1. open license – anyone can use my data (with/without attribution)
  2. reciprocal open license – anyone can use my data, but whatever they use it with must also be exposed under this license
  3. trust license – the person to whom I license this data may use it as they please
  4. silo license – the person to whom I license this data may use it as they please within the local system

1. is likely to be impractical in the context of social networking sites without fine granularity of data access – e.g. I’m happy for
my name and homepage to be associated together in public, but would rather my email and geographic address are restricted. Long term I believe we will need this.

2. is ideal for Open Data, in fact this is essentially what the new Open Data Commons license looks like (disclosure – I work for Talis, the company who got together with Creative Commons/Science Commons to produce this license). But the copyleft nature of the license probably wouldn’t appeal to many social networking services who see their data garden as business value.

3. seems naive, but I can’t think of a better way of approaching things

4. would in effect be a formalisation of the current Facebook position

So I think it might be worth considering what 3 & 4 might look like in more detail.

Please bear in mind with the above that IANAL, but then again I doubt many lawyers have a particularly sophisticated view of data. This stuff may be best driven by the folks getting their hands dirty.

Thoughts?

~~~~~

Afterthoughts: I didn’t realise when I wrote this that the script Scoble was running used OCR over images, rather than markup scraping. Ok, so Facebook are prepared to go a long way to prevent access to this data, but I don’t think that alters the fact that Scoble does have access to the data, or my point about the bogosity of Web site exposure vs. API distinction.

Turns out some related discussion had been happening too, suggesting boilerplate Terms and Conditions for social networking sites, as mentioned on
the DataPortability Policy Reference Design page. These would certainly be nice to have, potentially convenient and palatable for the service operators. But as someone who habitually clicks through acceptance of such things, I still think it important to maximise the service user’s control and awareness of what’s happening to their data, and that would mean also having finer-grained interaction associated with “making friends“.

Finally, a couple of snippets from Nick Carr’s post on the Scoble-nanigans:

Far from being just “his own information,” however, the information included the names, email addresses, and birthdays of 5,000 Facebookers who had “friended” Scoble. The act of “friending” on a social network site, it’s important to remember, is a fairly cavalier act, often undertaken with little thought….

After all, if someone has your name, email address, and birthday, they pretty much have your identity – not just your online identity, but your
real-world identity.

Absolutely. But rather than concluding that Scoble did wrong by taking that information out of context, I’d suggest the social networking sites have
so far swept the issue under the carpet. Carr again:

At the very least, members should have the right to decide whether or not their personal information can be scraped out of the Facebook
database.

Arguably they already have that right, but as Scoble’s case demonstrates, the enforcement of such a right is unfeasible based on naive technical mechanisms. As we’ve seen with email and spam, legal action after the fact is generally ineffectual. What’s needed is more preemptive privacy support built around informed choice and trust. A little license or two could help a lot. (Contrary to how it often seems, licenses don’t have to be hard work for the user – check Sean B. Palmer’s recent proposal for a minimal software license, and the one he discovered that would be nearly as neat, the Eiffel Forum License).

Incidentally, regarding trust, there’s another aspect nearby. How do we trust the information someone provides on the Web? In the social network
context, if we are to trust someone, how do we know they are who they say they are? Although social issues are clearly still central, on the technical
side I’d point to Semantic Web technologies as being the best bet.