Data Licenses for Social Network Services
I mailed the note below to the DataPortability list last week, and as the response was quite positive I’ll hazard it’ll be worthwhile posting over here too. It’s closely related to Paul’s recent posts about shenanigans and privacy in the cloud, and as I mention in the mail, Talis played a key role in the Open Data Commons license, so this does seem like a good place.
More crucially to me, Chris Saad, who’s been doing a grand job of pulling things together for the DataPortability group, suggested I take it another step and post an outline toward the DataPortability Policy Reference Design. For that, I really could do with some feedback on what I’ve got so far… (some postscripts below).
~~~~~
I think the good Mr. Scoble’s recent experience with Facebook offers some good pointers to areas which need work. One which seems prominent is data ownership and licensing. Here are some sketchy thoughts on the matter.
The starting point I would suggest is “I own my data“. This would correspond more or less to the default copyright on documents -
even if you don’t say anything explicit on something you write, you have the copyright.
What happens when we sign up for a service is we allow that party access to (some parts of) our own data, currently usually by filling in forms. Wen
we connect to friends within social networking systems is we allow them access to (some parts of) our own data. In both cases this seems an implicit
licensing of that data for subsequent use. However, not everyone sees things that way.
Dare Obasanjo draws
a distinction between information exposed on the service’s web pages and that exposed through the API. While the quality of the data may differ
significantly, I’d suggest that in terms of licensing this distinction is bogus. If I can copy & paste from one app to another, that can have the net end result as scraping. As Paul Downey put it, good web APIs are just web sites.
A more extreme view can be found in a comment
on Scoble’s blog “…you stole my personal details…“. While this seems a kneejerk reaction, it’s clear how such a perception could arise.
Right now the service providers generally allow connection with a vague “he my friend“, and bury any details deep within their Terms of
Service. But if the terms of the connection were made explicit, not only for signup with the service, but with every connection event, any ambiguity would be removed. Hence:
“Robert is my friend…I’d like to grant him access to my data”
Which leads onto the question of what form such a license might take.
Many of the options are already visible in copyright and software licenses, though I don’t believe (m)any are directly suitable for use with data. The difficulties arrive with data derived from the original data – along the lines of software extension and modification, but twistier (e.g. attention profiles which might only contain derived data, but couldn’t exist without the original).
Anyhow, possible examples would be:
- open license – anyone can use my data (with/without attribution)
- reciprocal open license – anyone can use my data, but whatever they use it with must also be exposed under this license
- trust license – the person to whom I license this data may use it as they please
- silo license – the person to whom I license this data may use it as they please within the local system
1. is likely to be impractical in the context of social networking sites without fine granularity of data access – e.g. I’m happy for
my name and homepage to be associated together in public, but would rather my email and geographic address are restricted. Long term I believe we will need this.
2. is ideal for Open Data, in fact this is essentially what the new Open Data Commons license looks like (disclosure – I work for Talis, the company who got together with Creative Commons/Science Commons to produce this license). But the copyleft nature of the license probably wouldn’t appeal to many social networking services who see their data garden as business value.
3. seems naive, but I can’t think of a better way of approaching things
4. would in effect be a formalisation of the current Facebook position
So I think it might be worth considering what 3 & 4 might look like in more detail.
Please bear in mind with the above that IANAL, but then again I doubt many lawyers have a particularly sophisticated view of data. This stuff may be best driven by the folks getting their hands dirty.
Thoughts?
~~~~~
Afterthoughts: I didn’t realise when I wrote this that the script Scoble was running used OCR over images, rather than markup scraping. Ok, so Facebook are prepared to go a long way to prevent access to this data, but I don’t think that alters the fact that Scoble does have access to the data, or my point about the bogosity of Web site exposure vs. API distinction.
Turns out some related discussion had been happening too, suggesting boilerplate Terms and Conditions for social networking sites, as mentioned on
the DataPortability Policy Reference Design page. These would certainly be nice to have, potentially convenient and palatable for the service operators. But as someone who habitually clicks through acceptance of such things, I still think it important to maximise the service user’s control and awareness of what’s happening to their data, and that would mean also having finer-grained interaction associated with “making friends“.
Finally, a couple of snippets from Nick Carr’s post on the Scoble-nanigans:
Far from being just “his own information,” however, the information included the names, email addresses, and birthdays of 5,000 Facebookers who had “friended” Scoble. The act of “friending” on a social network site, it’s important to remember, is a fairly cavalier act, often undertaken with little thought….
After all, if someone has your name, email address, and birthday, they pretty much have your identity – not just your online identity, but your
real-world identity.
Absolutely. But rather than concluding that Scoble did wrong by taking that information out of context, I’d suggest the social networking sites have
so far swept the issue under the carpet. Carr again:
At the very least, members should have the right to decide whether or not their personal information can be scraped out of the Facebook
database.
Arguably they already have that right, but as Scoble’s case demonstrates, the enforcement of such a right is unfeasible based on naive technical mechanisms. As we’ve seen with email and spam, legal action after the fact is generally ineffectual. What’s needed is more preemptive privacy support built around informed choice and trust. A little license or two could help a lot. (Contrary to how it often seems, licenses don’t have to be hard work for the user – check Sean B. Palmer’s recent proposal for a minimal software license, and the one he discovered that would be nearly as neat, the Eiffel Forum License).
Incidentally, regarding trust, there’s another aspect nearby. How do we trust the information someone provides on the Web? In the social network
context, if we are to trust someone, how do we know they are who they say they are? Although social issues are clearly still central, on the technical
side I’d point to Semantic Web technologies as being the best bet.





