Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Updates

Follow us on:

Categories

Archives

RSS Incoming Links

  • An error has occurred; the feed is probably down. Try again later.

License

Creative Commons License

Interesting semantic web stuff

By Tom Scott
| This guest post originally appeared on Tom Scott’s blog; republished under CreativeCommons License, and with kind permission of the author.

It’s starting to feel like the world has suddenly woken up to the whole Linked Data thing — and that’s clearly a very, very good thing. Not only are Google (and Yahoo!) now using RDFa but a whole bunch of other things are going on, all rather exciting, below is a round up of some of the best. But if you don’t know what I’m talking about you might like to start off with TimBL’s talk at TED.

TimBL is working with the UK Cabinet Office (as an advisor) to make our information more open and accessible on the web [cabinetoffice.gov.uk]
The blog states that he’s working on:

  • overseeing the creation of a single online point of access and work with departments to make this part of their routine operations.
  • helping to select and implement common standards for the release of public data
  • developing Crown Copyright and ‘Crown Commons’ licenses and extending these to the wider public sector
  • driving the use of the internet to improve consultation processes.
  • working with the Government to engage with the leading experts internationally working on public data and standards

The Guardian has an article on the appointment.

Closer to home there have been a few interesting developments

Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections [pdf]
Our paper at this years European Semantic Web Conference (ESWC2009) looking at how the BBC has adopted semantic web technologies, including DBpedia, to help provide a better, more coherent user experience. For which we won best paper of the in-use track – congratulations to Silver and Georgie.

The BBC has announced a couple SPARQL endpoints, hosted by talis and openlink [welcomebackstage.com]
Both platforms allow you to search and query the BBC data in a number of different ways, including SPARQL — the standard query language for semantic web data. If you’re not familiar with SPARQL, the Talis folk have published a tutorial that uses some NASA data.

A social semantic BBC?
Nice presentation from Simon and Ben on how social discovery of content could work… “show me the radio programmes my friends have listen to, show me the stuff my friends like that I’ve not seen” all built on people’s existing social graph. People meet content via activity.

PriceWaterhouseCooper’s spring technology forecast focuses on Linked Data [pwc.com]
“Linked Data is all about supply and demand. On the demand side, you gain access to the comprehensive data you need to make decisions. On the supply side, you share more of your internal data with partners, suppliers, and—yes—even the public in ways they can take the best advantage of. The Linked Data approach is about confronting your data silos and turning your information management efforts in a different direction for the sake of scalability. It is a component of the information mediation layer enterprises must create to bridge the gap between strategy and operations… The term “Semantic Web” says more about how the technology works than what it is. The goal is a data Web, a Web where not only documents but also individual data elements are linked.”
Including an interview with me!

You should also check out…

sameas.org a service to help link up equivalent URIs
It helps you to find co-references between different data sets. Interestingly it’s also licenced under CC0 which means all copyright and related or neighboring rights are waived.

Enhanced by Zemanta

Image: “Semantic Web Rubik’s Cube” by dullhunk, CC License, via flickr

Britain 2.5

It’s hardly new for this blog or our community to cover issues of open access and making information useful for users. But, what if we were to begin speaking in terms  such as: “A call for transparency,” or subtly replace user with citizen?  With little substantive shift of core meaning, the whole message becomes one of rights, responsibilities, and public duty.

I’ve been watching this week as the ember at the heart of this dialogue has been fanned with air-time on mainstream media, and is about to receive its fuel. First, UK Prime Minister Gorden Brown asked Sir Tim Berners-Lee  ”to help us drive the opening up of access to Government data in the web over the coming months” appointing him to a special role advising Parliament. In an interview with BBC tech correspondent Rory Cellan-Jones, Sir Tim discussed his position; explaining that he’s pushing for transparency: “This is our data. This is our taxpayers’ money which has created this data, so I would like to be able to see it, please.”

Sir Tim had the audience at the tech-friendly TED conference chanting “Raw Data Now” back in February, and he’s now been invited by a sitting government leader to make this happen.

This week also saw the publication of the Digital Britain report, outlining Parliament’s plans for a more connected future. I must admit, for the record, that I haven’t read all 239 pages of the report (made available via bbc.co.uk), rather, I’ve skimmed it and read several overviews. The gist seems to be that the UK plans to invest in the future of its citizens’ internet connectivity, upgrading existing infrastructure and providing access where there currently isn’t. This investment will cover both wired broadband provision (with a stated aim of 2MBps minimum for every household) and wireless, encouraging investment in 3g provision by allowing mobile companies to have their network licenses more permanently.  It recommends subsidising development wherever the market can’t provide; seemingly equating net access with public utilities (The PM further clarified his thoughts by saying the Internet is as vital as water or gas). More information on this report can be found on the summary page at the Guardian, on twitter: hashtag #digitalbritain, and Bill Thompson’s tech-centric overview.

All this week needs is a major announcement of something moving entirely to cloud-computing to look a bit like the convergence I blogged about a few days ago ;).

So, what has this incredible week brought us? It’s a governmental lead on opening up access to data. Their appointment of TBL makes me think that it’s likely we’ll see more and more linked-data projects coming from the public sector (not just access to, but usable, linked data). Over the next few years, the UK plans to improve its infrastructure and incentivize development on communications networks, and they’ve begun to use language suggesting that being part of the network and access to Public data are rights issues.

Sir Tim spoke, in the interview, about beginning with low-hanging fruit: pilot schemes which open up data and watch what happens.

What are you building?

Image: “Sparks”, by Steven Wong via flickr; Creative Commons By, Share Alike License

Tom Gruber talks about Siri

In my latest podcast I talk with Tom Gruber, CTO and co-founder of Siri.

We discuss Siri, and explore the whole notion of the ‘Virtual Personal Assistant,’ of which Siri is one.

 
 Standard Podcast [51:16m]: Play Now | Play in Popup | Download (115)

This conversation was recorded on Wednesday 10 June, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

The BBC, the Graph, and Linked Data Stores

Over the past few weeks, Talis has been working with the BBC to crawl their programmes and music sites and pull in a bunch of usable data into a Platform store. This store now contains information on over 360,000 programmes and more than 34,000 musicians. There is data about albums and reviews, and about programme series and even versions of episodes. This is an interesting dataset.

What’s more, the BBC have made this data available to you to mashup and make use of. They’ve discussed their SPARQL endpoint on their Backstage developers’ blog. We’ve got more details about the store, including information on how you can get a hold of the data over on our n2 developers’ blog.

Leigh, in the n2 post, listed several applications he could see for the data:

Programme Reviews. It’d be easy to build a mashup of the BBC programmes data and something like Revyu (which also has a SPARQL endpoint) to allow someone to review a programme that they watched last night. Note, that as our crawling will be lagging behind the live site until we’ve implemented real-time updates, there will be a lead time between something being aired and in the Platform for reviewing.

PVR Integration. There are a number of open source PVR solutions out there, could some of these be updated to automatically pull in additional data from the endpoint to improve electronic programme guides?

Geographic Overlays. The interconnections between radio programmes, artists and their locations, offers an opportunity to build some mapping mashups, using either Google Maps or Earth. For example it ought to be possible to lay out the geographic spread of artists played by different BBC radio programmes and stations. Interested in music from a particular country or region? (Maybe you’re planning a trip there and what to pick up on the local vibe) Then use a map to home in on radio programmes that are most likely to play those artists.

Fan Widgets. The ability to extract data from the endpoint using SPARQL and JSON means that its really easy to create little widgets to include programme data on external web pages. What could something like the Doctor Who Tardis Index File be enriched by widgets that came straight from the BBC database? Throw in additional annotations from the community and you could make some really interesting embeddable gadgets. Of course there’s also the other direction: if fan communities start using BBC identifiers then the BBC may be able to feed this crowd-sourced data back into their site, just as they’re doing with Wikipedia (via dbpedia)

Under the Talis Connected Commons scheme anyone can have free hosting on the Platform for public domain data, so if a fan community wanted to organize itself around creating additional annotations for BBC programmes (how about character lists? mood assessment? scene breakdowns?) then these can be stored in the Platform for free, and then mashed up with the BBC data on the server-side using features like the Augmentation service, or on the client-side using SPARQL and JSON. Lots of potential there.

Web two dot oh plus one, in the cloud, with bells on…

The tech world is telling a story about the Web and computing, and the mainstream media seem to be catching on. They’re hearing about clouds, wikis, and the history of the World-wide Web. The whole thing reads like some sort of legend…

It was an era, long ago, when the folk of Middle Class plugged in their Mo-Dems and listened to arcane, magical sounds as their £120 beige box enabled a blazing 14.4 kb/s connection, and they only had to wait a few minutes to call forth script and from anywhere on earth. It was an age that saw the beginnings of email, where people composed messages and sent them down the phone lines at lightning speeds (unless a packet dropped…). This was the time of Web 1.0.

Then, the web collapsed. No one used the internet any more. Modems became paperweights and millions of metres of ethernet cable were grubbed up to make room for under-floor heat in offices. The world was quiet, and the people of Middle Class forgot what they knew.

Until, there dawned the advent of Web 2.0. People re-learned their former ways, and improved upon the innovations of their fathers. Instead of sites and pages, they began to use “Web Apps” which accomplished Tasks, and they became their masters. The great titan Google was made, and he knew all and directed the world toward knowledge. The elves of the web taught men the ways of blogging and messenging and eventually (when they’d mastered all these things with wiki-training to boot) Social Media and Networking.

Only, that’s not exactly how it happened; is it? Many commentators and Alpha Geeks have divided the story of the web into convenient phases, and they’ve roughly settled around a versioning metaphor common to software. Have a look at your favorite browser, and you’ll see a version number (Safari 4 for me, if you’re interested) which lets you know how many iterations have been and gone before. There are certainly noted differences, and turning points, where people phased out their dependence on one thing for the convenience and utility of something better. Tim O’Reilly, who coined the phrase Web 2.0, wrote a much-linked post in 2005 trying to explain and crystalise some of the trends he was seeing which were different from the first few years of the web. The fact that he had to clarify what he meant, and that it took the non-geek world three years to catch up testifies to the notion that the change was gradual. It makes me think that we missed out all the .1-.10’s in the version numbers, and many alpha and beta tests along the way.

Now we are engaged in the great Web 3.0, where we are applying the logic of the past to the present and guessing at the future. Only, because no one is actually releasing versions of the web like a good, reliable software company should, the story is much more complicated—and interesting!

There are notable trends, with backers and bloggers riding various waves. But it seems to me that the point of this is a convergence. The mobile web is bringing new sorts of information to people, and they can make use of this info wherever they happen to be because of advances in devices ad connectivity. As phones and web-enabled devices get better, so to do the chips we seem to have embedded all over the place, and we can now begin to have a more clear picture of what we do through the information we gather from our heaters, cars, and pedometers. Also, as more objects become connected, the grunt-work of number-crunching and storage is becoming commoditized into big, efficient, utility-like cloud services, which host and work with our collected information much more effectively than the gadget in your hand could ever hope to do. Others, like ourselves, talk about the Semantic Web, which allows for an evolution from a bunch of connected documents to the explicit connections between bits of information.

But, I see a trend there which is common to all candidates: information. The web allowed for information to be shared, then collaboratively worked. Now, I see this information becoming useful in and of itself…as data.

Walt Mossberg talks about Web 3.0 as if it is riding on the backs of mobile and connected devices. And I think it probably is. Tim Berners-Lee recently spoke to the BBC about the future of the web including some incredible future of pixels everywhere, where any surface could display information. He’s also repeatedly talked about the future of the web being semantic (he invented the term, let’s not forget) where Linked Data is the web done right. And who am I to argue with the inventor of the Web?

But I don’t think there’s so much a conflict or competition as a coming together here. If there will be a Web 3.0 (and it seems a likely, media-friendly label), I think it will include all of these trends centred around the focus of data. The connected devices allow us access to cloud-computing and storage (computing and storage of data…). Many chips gather data about ourselves, which we can use to personalise our view on the web of data, and the Linking of this data through semantics lets it all be calculable, programmable, and useful. It kind of reminds me of a computer, you know… The chips and our collective use of web applications are input and sources, and the various devices we use are displays and UI’s onto a massive, scalable CPU in the cloud. Linked Data could be the Operating System, allowing and enabling anything to be connected and programmed.

Web 3.0, to me, is a convergence of the trends, and it’s all about data. It’s not a simple story, and any convenient label is to convenient to be comprehensive, but I’m pretty sure the next things will all centre on our ability to make use of and personalise vast chunks of previously-obtuse data.

Image “#Black rain : Convergence” by FredArmitage via flickr—Creative Commons License.

Andy Denmark talks about TripIt and the rise of structured data

In my latest podcast I talk with Andy Denmark, co-founder and VP for Development at TripIt.

We discuss the company’s approach to enriching travel and itinerary information for their users, and consider the implications of a growing interest in structured data across the Web.

 
 Standard Podcast [40:00m]: Play Now | Play in Popup | Download (128)

During the conversation, we refer to the following resources;

This conversation was recorded on Thursday 4 June, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

Down Tools…

Update: all maintenance has been successfully accomplished, and the blogs should all be up and available again. Thanks to the Live Services team for some handy/fast work! The Nodalities blog will be unavailable from around 8pm (GMT) this evening for some scheduled maintenance. This will mean the posts, pages and RSS/Atom feeds will all be unaccessible till around 8am tomorrow morning.

We hope this won’t be too inconvenient for anyone, and that you enjoy the break ;)

Image: “Rex - Gone Fishing” by snuzzy via flickr Creative Commons, “By 2.0″

Stephanie Lemieux talks about folksonomy and taxonomy in the Enterprise

In my latest podcast I talk with Stephanie Lemieux, a Senior Consultant at Earley & Associates.

We discuss the role of taxonomy and folksonomy in the Enterprise, and consider some of Stephanie’s ideas with respect to the value of a hybrid approach enabled by semantic technologies.

 
 Standard Podcast [31:20m]: Play Now | Play in Popup | Download (331)

During the conversation, we refer to the following resources;

This conversation was recorded on Friday 29 May, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

Erik Nemeth talks about the place of disciplinary research databases in a Web 2.0 world

In my latest podcast I talk with Erik Nemeth, a Senior Data Specialist at the Getty Research Institute.

We discuss Erik’s ideas on the ways in which discipline-specific databases need to evolve to remain competitive in the discovery of scholarly literature.

 
 Standard Podcast [31:39m]: Play Now | Play in Popup | Download (222)

During the conversation, we refer to the following resources;

This conversation was recorded on Tuesday 26 May, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

Semantic Web and Enterprise: PricewaterhouseCoopers’ call to a Linked Data future

Order vs. ChaosIt must be a sign of the times when the most informative Semantic Web overview I’ve read in a long time has not come from a semweb company, nor from a Linked Data initiative or an academic or technologist’s personal blog. Rather, PricewaterhouseCoopers—massive, international professional services firm—has set a new standard in Semantic Web publications by covering it exclusively in their Technology Forecast, 2009. They must think there is some future in the Linked Data web.

Calling on firms and governments to open up data has been a thankless but far from fruitless task. Talis has funded work on the Public Domain Dedication and Licence, and many in science and academia make eloquent cases for open access to public data. PwC’s Tech Forecast not only predicts and calls for more linked and open data, but makes one hell of a business case for the future of the Semantic Web. The technology overview, instantly recognizable to anyone familiar with the Semantic Web, carries a deeper dimension to firms and the enterprise world itself.

Relational databases do not scale:

Relational data models never were intended for integration at the scale enterprises now need. Relational data management soaks up IT resources that should be dedicated elsewhere. Plus, traditional databases create silos because relational data are specific to the database system implementation.

Linking up your data with the rest of the world frees it to be used:

Their future business agility will depend on their ability to focus on techniques that optimize sharing rather than maintaining silos. That’s why a standardsbased approach makes sense. In a digital ecosystem, the assets of others can benefit you directly, and vice versa. It’s about supply and demand.

Riding the wave linked data generates means you can’t control everything—but you knew that already:

Enterprises need control over some data, but not all data. Many enterprises have learned that data warehousing doesn’t scale to encompass all corporate data. … Limit the data warehouse to data management problems that align with its attention to detail, its connection to transaction systems, and for problems that need such heavy investments.

Following this overview (which also managed to quickly and comprehensively cover ontologies) are some telling interviews with some enterprises who have made the leap to RDF already. Tom Scott discussed the BBC’s story, and answered specific questions about linked data at bbc.co.uk/programmes (also covered by Tom in Nodalities Magazine).

I was pleased to note that Talis got a mention in a sample list of vendors, and the authors of the Forecast also made use of several Talis-produced resources, including Sir Tim Berners-Lee’s interview with Paul Miller.

It’s a big step, this. This is a professional services firm “getting” the Semantic Web. This is PricewaterhouseCoopers predicting the rise and use of the Linked Data in 2009. This is a call to enterprises to get their data in order.

Or, really, just to open up their data and let the whole community worry about the order.

image: “Order vs. Chaos”, Ivan Makarov (http://www.flickr.com/photos/ivanomak/446763022/) via flickr