Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

RSS Incoming Links

  • An error has occurred; the feed is probably down. Try again later.

License

Creative Commons License

Author Archive

Britain 2.5

It’s hardly new for this blog or our community to cover issues of open access and making information useful for users. But, what if we were to begin speaking in terms  such as: “A call for transparency,” or subtly replace user with citizen?  With little substantive shift of core meaning, the whole message becomes one of rights, responsibilities, and public duty.

I’ve been watching this week as the ember at the heart of this dialogue has been fanned with air-time on mainstream media, and is about to receive its fuel. First, UK Prime Minister Gorden Brown asked Sir Tim Berners-Lee  ”to help us drive the opening up of access to Government data in the web over the coming months” appointing him to a special role advising Parliament. In an interview with BBC tech correspondent Rory Cellan-Jones, Sir Tim discussed his position; explaining that he’s pushing for transparency: “This is our data. This is our taxpayers’ money which has created this data, so I would like to be able to see it, please.”

Sir Tim had the audience at the tech-friendly TED conference chanting “Raw Data Now” back in February, and he’s now been invited by a sitting government leader to make this happen.

This week also saw the publication of the Digital Britain report, outlining Parliament’s plans for a more connected future. I must admit, for the record, that I haven’t read all 239 pages of the report (made available via bbc.co.uk), rather, I’ve skimmed it and read several overviews. The gist seems to be that the UK plans to invest in the future of its citizens’ internet connectivity, upgrading existing infrastructure and providing access where there currently isn’t. This investment will cover both wired broadband provision (with a stated aim of 2MBps minimum for every household) and wireless, encouraging investment in 3g provision by allowing mobile companies to have their network licenses more permanently.  It recommends subsidising development wherever the market can’t provide; seemingly equating net access with public utilities (The PM further clarified his thoughts by saying the Internet is as vital as water or gas). More information on this report can be found on the summary page at the Guardian, on twitter: hashtag #digitalbritain, and Bill Thompson’s tech-centric overview.

All this week needs is a major announcement of something moving entirely to cloud-computing to look a bit like the convergence I blogged about a few days ago ;).

So, what has this incredible week brought us? It’s a governmental lead on opening up access to data. Their appointment of TBL makes me think that it’s likely we’ll see more and more linked-data projects coming from the public sector (not just access to, but usable, linked data). Over the next few years, the UK plans to improve its infrastructure and incentivize development on communications networks, and they’ve begun to use language suggesting that being part of the network and access to Public data are rights issues.

Sir Tim spoke, in the interview, about beginning with low-hanging fruit: pilot schemes which open up data and watch what happens.

What are you building?

Image: “Sparks”, by Steven Wong via flickr; Creative Commons By, Share Alike License

The BBC, the Graph, and Linked Data Stores

Over the past few weeks, Talis has been working with the BBC to crawl their programmes and music sites and pull in a bunch of usable data into a Platform store. This store now contains information on over 360,000 programmes and more than 34,000 musicians. There is data about albums and reviews, and about programme series and even versions of episodes. This is an interesting dataset.

What’s more, the BBC have made this data available to you to mashup and make use of. They’ve discussed their SPARQL endpoint on their Backstage developers’ blog. We’ve got more details about the store, including information on how you can get a hold of the data over on our n2 developers’ blog.

Leigh, in the n2 post, listed several applications he could see for the data:

Programme Reviews. It’d be easy to build a mashup of the BBC programmes data and something like Revyu (which also has a SPARQL endpoint) to allow someone to review a programme that they watched last night. Note, that as our crawling will be lagging behind the live site until we’ve implemented real-time updates, there will be a lead time between something being aired and in the Platform for reviewing.

PVR Integration. There are a number of open source PVR solutions out there, could some of these be updated to automatically pull in additional data from the endpoint to improve electronic programme guides?

Geographic Overlays. The interconnections between radio programmes, artists and their locations, offers an opportunity to build some mapping mashups, using either Google Maps or Earth. For example it ought to be possible to lay out the geographic spread of artists played by different BBC radio programmes and stations. Interested in music from a particular country or region? (Maybe you’re planning a trip there and what to pick up on the local vibe) Then use a map to home in on radio programmes that are most likely to play those artists.

Fan Widgets. The ability to extract data from the endpoint using SPARQL and JSON means that its really easy to create little widgets to include programme data on external web pages. What could something like the Doctor Who Tardis Index File be enriched by widgets that came straight from the BBC database? Throw in additional annotations from the community and you could make some really interesting embeddable gadgets. Of course there’s also the other direction: if fan communities start using BBC identifiers then the BBC may be able to feed this crowd-sourced data back into their site, just as they’re doing with Wikipedia (via dbpedia)

Under the Talis Connected Commons scheme anyone can have free hosting on the Platform for public domain data, so if a fan community wanted to organize itself around creating additional annotations for BBC programmes (how about character lists? mood assessment? scene breakdowns?) then these can be stored in the Platform for free, and then mashed up with the BBC data on the server-side using features like the Augmentation service, or on the client-side using SPARQL and JSON. Lots of potential there.

Web two dot oh plus one, in the cloud, with bells on…

The tech world is telling a story about the Web and computing, and the mainstream media seem to be catching on. They’re hearing about clouds, wikis, and the history of the World-wide Web. The whole thing reads like some sort of legend…

It was an era, long ago, when the folk of Middle Class plugged in their Mo-Dems and listened to arcane, magical sounds as their £120 beige box enabled a blazing 14.4 kb/s connection, and they only had to wait a few minutes to call forth script and from anywhere on earth. It was an age that saw the beginnings of email, where people composed messages and sent them down the phone lines at lightning speeds (unless a packet dropped…). This was the time of Web 1.0.

Then, the web collapsed. No one used the internet any more. Modems became paperweights and millions of metres of ethernet cable were grubbed up to make room for under-floor heat in offices. The world was quiet, and the people of Middle Class forgot what they knew.

Until, there dawned the advent of Web 2.0. People re-learned their former ways, and improved upon the innovations of their fathers. Instead of sites and pages, they began to use “Web Apps” which accomplished Tasks, and they became their masters. The great titan Google was made, and he knew all and directed the world toward knowledge. The elves of the web taught men the ways of blogging and messenging and eventually (when they’d mastered all these things with wiki-training to boot) Social Media and Networking.

Only, that’s not exactly how it happened; is it? Many commentators and Alpha Geeks have divided the story of the web into convenient phases, and they’ve roughly settled around a versioning metaphor common to software. Have a look at your favorite browser, and you’ll see a version number (Safari 4 for me, if you’re interested) which lets you know how many iterations have been and gone before. There are certainly noted differences, and turning points, where people phased out their dependence on one thing for the convenience and utility of something better. Tim O’Reilly, who coined the phrase Web 2.0, wrote a much-linked post in 2005 trying to explain and crystalise some of the trends he was seeing which were different from the first few years of the web. The fact that he had to clarify what he meant, and that it took the non-geek world three years to catch up testifies to the notion that the change was gradual. It makes me think that we missed out all the .1-.10’s in the version numbers, and many alpha and beta tests along the way.

Now we are engaged in the great Web 3.0, where we are applying the logic of the past to the present and guessing at the future. Only, because no one is actually releasing versions of the web like a good, reliable software company should, the story is much more complicated—and interesting!

There are notable trends, with backers and bloggers riding various waves. But it seems to me that the point of this is a convergence. The mobile web is bringing new sorts of information to people, and they can make use of this info wherever they happen to be because of advances in devices ad connectivity. As phones and web-enabled devices get better, so to do the chips we seem to have embedded all over the place, and we can now begin to have a more clear picture of what we do through the information we gather from our heaters, cars, and pedometers. Also, as more objects become connected, the grunt-work of number-crunching and storage is becoming commoditized into big, efficient, utility-like cloud services, which host and work with our collected information much more effectively than the gadget in your hand could ever hope to do. Others, like ourselves, talk about the Semantic Web, which allows for an evolution from a bunch of connected documents to the explicit connections between bits of information.

But, I see a trend there which is common to all candidates: information. The web allowed for information to be shared, then collaboratively worked. Now, I see this information becoming useful in and of itself…as data.

Walt Mossberg talks about Web 3.0 as if it is riding on the backs of mobile and connected devices. And I think it probably is. Tim Berners-Lee recently spoke to the BBC about the future of the web including some incredible future of pixels everywhere, where any surface could display information. He’s also repeatedly talked about the future of the web being semantic (he invented the term, let’s not forget) where Linked Data is the web done right. And who am I to argue with the inventor of the Web?

But I don’t think there’s so much a conflict or competition as a coming together here. If there will be a Web 3.0 (and it seems a likely, media-friendly label), I think it will include all of these trends centred around the focus of data. The connected devices allow us access to cloud-computing and storage (computing and storage of data…). Many chips gather data about ourselves, which we can use to personalise our view on the web of data, and the Linking of this data through semantics lets it all be calculable, programmable, and useful. It kind of reminds me of a computer, you know… The chips and our collective use of web applications are input and sources, and the various devices we use are displays and UI’s onto a massive, scalable CPU in the cloud. Linked Data could be the Operating System, allowing and enabling anything to be connected and programmed.

Web 3.0, to me, is a convergence of the trends, and it’s all about data. It’s not a simple story, and any convenient label is to convenient to be comprehensive, but I’m pretty sure the next things will all centre on our ability to make use of and personalise vast chunks of previously-obtuse data.

Image “#Black rain : Convergence” by FredArmitage via flickr—Creative Commons License.

Down Tools…

Update: all maintenance has been successfully accomplished, and the blogs should all be up and available again. Thanks to the Live Services team for some handy/fast work! The Nodalities blog will be unavailable from around 8pm (GMT) this evening for some scheduled maintenance. This will mean the posts, pages and RSS/Atom feeds will all be unaccessible till around 8am tomorrow morning.

We hope this won’t be too inconvenient for anyone, and that you enjoy the break ;)

Image: “Rex - Gone Fishing” by snuzzy via flickr Creative Commons, “By 2.0″

Semantic Web and Enterprise: PricewaterhouseCoopers’ call to a Linked Data future

Order vs. ChaosIt must be a sign of the times when the most informative Semantic Web overview I’ve read in a long time has not come from a semweb company, nor from a Linked Data initiative or an academic or technologist’s personal blog. Rather, PricewaterhouseCoopers—massive, international professional services firm—has set a new standard in Semantic Web publications by covering it exclusively in their Technology Forecast, 2009. They must think there is some future in the Linked Data web.

Calling on firms and governments to open up data has been a thankless but far from fruitless task. Talis has funded work on the Public Domain Dedication and Licence, and many in science and academia make eloquent cases for open access to public data. PwC’s Tech Forecast not only predicts and calls for more linked and open data, but makes one hell of a business case for the future of the Semantic Web. The technology overview, instantly recognizable to anyone familiar with the Semantic Web, carries a deeper dimension to firms and the enterprise world itself.

Relational databases do not scale:

Relational data models never were intended for integration at the scale enterprises now need. Relational data management soaks up IT resources that should be dedicated elsewhere. Plus, traditional databases create silos because relational data are specific to the database system implementation.

Linking up your data with the rest of the world frees it to be used:

Their future business agility will depend on their ability to focus on techniques that optimize sharing rather than maintaining silos. That’s why a standardsbased approach makes sense. In a digital ecosystem, the assets of others can benefit you directly, and vice versa. It’s about supply and demand.

Riding the wave linked data generates means you can’t control everything—but you knew that already:

Enterprises need control over some data, but not all data. Many enterprises have learned that data warehousing doesn’t scale to encompass all corporate data. … Limit the data warehouse to data management problems that align with its attention to detail, its connection to transaction systems, and for problems that need such heavy investments.

Following this overview (which also managed to quickly and comprehensively cover ontologies) are some telling interviews with some enterprises who have made the leap to RDF already. Tom Scott discussed the BBC’s story, and answered specific questions about linked data at bbc.co.uk/programmes (also covered by Tom in Nodalities Magazine).

I was pleased to note that Talis got a mention in a sample list of vendors, and the authors of the Forecast also made use of several Talis-produced resources, including Sir Tim Berners-Lee’s interview with Paul Miller.

It’s a big step, this. This is a professional services firm “getting” the Semantic Web. This is PricewaterhouseCoopers predicting the rise and use of the Linked Data in 2009. This is a call to enterprises to get their data in order.

Or, really, just to open up their data and let the whole community worry about the order.

image: “Order vs. Chaos”, Ivan Makarov (http://www.flickr.com/photos/ivanomak/446763022/) via flickr

A Good Week for RDFa

With the unveiling of Google’s RDFa support and discussions from the UK’s Central Office of Information around using RDFa in their job sites, there has certainly been a lot of coverage of RDFa and Linked Data over the past few days.

Google’s announcement feels a bit limp, hidden as it is in the webmasters’ tools. To read their own description of “Rich Snippets,” you’d think they were little more than an additional piece in the armory of SEO’s and content editors, giving them the ability to flag reviews and products on their pages. The real excitement, as Tim O’Reilly mentioned, is that this is Google’s first active support for explicit information. A site can now state: “We give this widget 4 stars out of 5, it costs £100, and our CEO is Joe Bloggs.” That’s fantastic!

I wonder if we tend to miss the importance of explicit statements, because we default to googleing for something and hoping the first page or so of results will contain the answer. I can very swiftly find “reviews” for a Logitech Mouse, for example; but I still have to go through the reviews and find what they said. I might be lucky if Google shows me the result within the site description, but I’m much more likely to need to follow my own lead after Google serves me up a bunch of links to follow. This lets sites explicitely surface an (admittedly currently woefully limited) amount of their own data. It makes much more sense for finding what you’re actually after without needing to disambiguate yourself. It feels like a step in the right direction. It leaves me personally wishing Google would open it right out and support full vocabularies, but I’m glad for this initial offering.

Alongside Google, the Central Office of Information seems to be taking a much more webby approach to Linked Data, by supporting FoaF and other public vocabularies. Mark Birbeck explains:

To facilitate this we set up an open source project called argot-hub, with a wiki, issue-tracking system and associated discussion lists.

The first vocabularies — or argots — that I defined were for job vacancies, but in order to make the terminology usable in other situations, I broke out argots for replying to the vacancy, the specification of contact details, location information, and so on.

An argot doesn’t necessarily involve the creation of new terms, and in fact most of the argots use terms from Dublin Core, FOAF and vCard. So although new terms have been created if they are needed, the main idea behind an argot is to collect together terms from various vocabularies that suit a particular purpose.

The first pages to support the RDFa information will be vacancy notices, which can be seen at the Civil Service home page. The great thing about this is that it’s supporting application information retrieval. An application can query the site, pull out explicit information, and voila: You’re very own “what jobs are available in the Civil Service” app. Looking at all the info there, you could have a field day, sorting by salary, area of interest or whatever.

So, two very different use cases for the Semantic Web via RDFa. What’s next?

Semantic Web Awareness

So, we’ve all heard of the Semantic Web. It’s been covered by BBC Radio 4 in the morning, it’s been in mainstream press and the blogosphere for years before that. You’re reading Nodalities, probably because you are particularly interested in the future of a connected data web.

But how much does the rest of the world know? How much do the thinkers and movers in the corporate world know about it? A recent survey by the Semantic Web Company seems to suggest that the Semantic Web is slowly coming into focus for the wider world. The study found that the participants had a high familiarity with the notions of the Semantic Web (and “Social Software”), and that developers who took part considered semantic web technologies to be relevant to their future development.

Interestingly, you can come to your own conclusions on the survey results, because they’ve made their data available (SPSS) for download. You can either read their report, or download the dataset and run your own stats.

I’d be very interested to know if anyone does run their own analysis. I’m afraid I’ll be taking their word for it at present, because I left the world of SPSS behind with quantitative linguistic analysis at university!

Connected Commons Coverage

Following our recent announcement of the Talis Connected Commons, there has been some significant coverage in the blogosphere. I’ll list a few of the links I’ve seen here, and if you’ve come across others, or have covered the Connected Commons yourself, please share in the comments.

Read Write Web

Marshall Kirkpatrick captured some of the Linked Data future vision with his timely coverage over on Read Write Web. Marshall’s coverage sets the stage for potentials and takes a very much forward-looking view on the Connected Commons.

Science Commons

With Talis’ use of the Creative Commons CC0 License as one route for the Connected Commons data licensing, the Science Commons blog has made reference to the scheme. I quote: “We commend Talis for using CC0 as a means to clearly mark and identify public domain data, and look forward to see what fruit this tree will bring for the open data / linked data communities.”

Unilever Cambridge Centre for Molecular Informatics

“We’ll certainly be taking this up.”

bbgm by Deepak Singh

“We are living in a world where data sharing, data access, and open data in general are getting more and more important, and available.”

The Content Guy

“That last bit is what has Mike and me interested - finding new ways of making use of the relationships between data and content that all the various semantic tools unearth.”

Open Knowledge Foundation

Again, if you have something to say on your own blog, or have come across more coverage of Connected Commons, drop it in the comments.

Twitter metadata—metaphor?

Snow near us.
Image by Zach_Beauvais via Flickr

I’m sure I’m introducing old friends; but Twitter is a “microbloggiing” platform, to give it its proper description. It gives users 140 characters to publish status updates, comments, gripes, complaints, praises, news and whatever comes to mind. It’s burst out of its original answer to the simple question: “What are you doing?” and users often tweet just about everything.

One interesting innovation is the integration of the hashtag: simply a hash symbol (#) and a tag descriptor for the comment. This gives people the ability to follow particular threads of updates or participate in conversations around an interest. They’re often used, for example, to update the goings on from conferences (#FOWA for example). People give their own content this little bit of information, and a search engine can find them. People can add additional information and follow conventions which allow for distributed trends that anyone can follow and interact with.

The recent snowfall in Britain gave rise to a flurry of tweets about road closures, amounts of snow falling, schools closing down and all the other chaos unleashed. When users followed a simple convention, however, this information got organised. People quickly adopted the #uksnow hashtag to track the topic; and eventually someone worked out a way to capture all the info needed to follow these geographically. By tweeting the first half of a UK post code plus a rating out of ten snow falling, anyone following the thread knows exactly where it’s snowing and how much is coming down. It’s like an instant weather polling station, distributed across the country. It can go a step further, however, when services can actually mashup these tweets when users turn their simple status updates into a mini line of code.

This little bit of information allows for people to write software to track and automate the twitter information. This interactive map from benmarsh.co.uk, for example, actually plots a visual graph of snowfall across Britain. Bigger snowflakes indicate larger numbers out of ten in the poll. It’s simple, really. Ingenious, possibly. But the fundamental distinction between this tracking ability and the noise of thousands of Twits shouting about the snow is that little bit of #metadata.

So, is this use of twitter a metaphor for the Semantic Web? It’s certainly a picture of automating information flow using metadata via software. Sounds Semanticcy to me.

 

 

Enhanced by Zemanta

voiD: Linking Linked Data

Here at Talis, we’re pretty big on the idea of Linked Data. It’s one of the key features of the Semantic Web vision, the idea that data are interconnected so man and machine can “follow their noses” to find new or relevant information. It’s also key to the idea of the “Web of Data” in which all published, online information can become part of a massive, usable database. An important element in seeing this become a reality is the notion of discoverability.

I remember, maybe ten years ago, someone explained that the World Wide Web contains so much information, that I’d be unlikely to find the “best” answer to my question. Of course, search engines have changed this, to such an extent that we “google” for information with a fair certainty that we’ll find what we’re after—or something closely related. If I want to find out about plane tickets to Boston, Massachusetts, i’d be unlikely to want to read about someone complaining about theatre tickets in Boston, Lincolnshire. Search engines made it possible to discover the right information that we’re after. Search engines, however, can only go so far in their ability to find information, because they’re looking at keywords within the documents they scan. They can’t identify many kinds of data, nor can they make use of them: they just point.

Fast-forward now to surfing the Semantic Web. Part of what makes the whole thing great is that the information (data) we’re after on the Semantic Web comes with its own information about what it is (metadata). So, I can know with an even greater degree of certainty, that the info I’ve found is what I’m after. It’s like finding a book in the Library. If we’re looking to borrow a particular book about the Battle of Hastings, for example, we probably don’t settle for the first one we come across in the catalogue. We look for the book we’re after, and know when we find it because it’s record identifies it to us. When we know what data is (and our software can be told what kind of data we’re looking at, and what’s in it), we can start to make use of these vast published datasets.

The authors of the voiD vocabulary are looking to make this metadata more relevant by providing data publishers with tools that describe their particular collection of data (dataset). It gives each dataset the ability to describe itself saying what topic it’s covering, what kind of data it comprises, numerical information about the data et cetera. voiD co-author Keith Alexander told me earlier: “It’s a way of saying “I have this dataset, this is what it’s all about, come and use it!”

You can find out more about the voiD vocabulary over on its site: http://rdfs.org/ns/void. The authors have gone a step further, however, and written a manual to make voiD that much easier to implement: http://rdfs.org/ns/void-guide. They’d appreciate feedback, too, so if you can think of a way to improve voiD or the guide, give them a ping at: void-rdfs-internals@googlegroups.com

Enhanced by Zemanta