Nodalities

From Semantic Web to Web of Data
Nodalities

Subscribe

  • Any Podcatcher
  • Any Feed Reader

Categories

Archives

License

Creative Commons License

This Week’s Semantic Web

Special Edition : SIOC Update

I had a man cold when I should have been doing my duty, but with no apologies (fairly safely assuming John has a CC-with-attribution kind of policy) here’s a good proxy :

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271

#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

Semantic Yellow Pages…

I was sent a link late last week to a white paper over on ZDNet by some Finnish researchers looking into the possibilities of opening up the services offered by Yellow Pages to the Semantic Web. Basically, the Finns looked at the traditional Yellow Pages service (specifically in the Helsinki area, but also more broadly), and found it lacking in several key ways.

The basic idea behind Yellow Pages is to provide a directory service for the general public, and monetise the advertisements of businesses wanting more prominence in the directory. The way current services work (online, that is) leaves much to be desired when it comes to matching the requirements of both users and businesses. Businesses miss out whenever a potential customer fails to notice their ad (because they found a different listing on a different page), and the user loses whenever their search turns up wrong or incomplete results. The reasons behind this mis-match is essentially technical: search and directory tables don’t provide the flexibility required to match users and businesses together whenever linguistic (homonyms, synonyms or hyponymic…) problems occur or when a service simply fails to produce the best case for the request.

I suggest reading the paper for the more technological perspectives (you will need to join ZDNet.co.uk, though I believe it’s free of charge), but the possibilities unearthed could potentially benefit both users of Yellow Pages services and the Semantic Web community itself.

A person looking for a business or seeking an answer to a problem in his area could benefit from the fact that his Yellow Pages results will become much more focused. If he looks for a “camera shop” because his is broken, he’s much more likely to find a shop running a repair service, for example. Businesses behind the service would also benefit from their advertisements being more well-placed to attract people actually looking for them.

However, a significant benefit from this system, is that the data made available through such a service could be reused by the Semantic Web itself. Imagine developing an application and being able to tie directly into a massive, international business directory? I’m sure your imagination (if you happen to be an imaginative developer) could be enlivened by this idea.

I’d be very interested to hear what you have to say about this kind of service, and what you’d build using this kind of data. Let me know where you’d take it, and what you’d like to see from such a directory.

Semantic Proxy

The folks over at Thompson Reuters announced their new Semantic Proxy service today. With full coverage by Paul Miller over on ZDNet, this could be very exciting news for the Semantic Web space indeed.

In short, Semantic Proxy creates metadata from any online resource. There’s a working demo over on their site, which will search your site and bring back instances of metadata, including relevance. Although it’s beta, it still offers some incredible potential benefits. The Calais team has promised that their Semantic Proxy service will be bringing proper Linked Data uri’s by the end of this quarter. As Paul said: “The cloud just got an awful lot bigger, an awful lot more current, and an awful lot more powerful.”

Zemanta gets personal

I just found out that Zemanta has been updated. I had a bit of a mini-review and a discussion with Jure from Zemanta on my blog in June, and they’ve been ironing out some of the difficult bits since. As a very brief intro, Zemanta is a plugin which suggests related items while you blog. It suggests images and related articles to include in the new post, and handles citations for the images you might include.

The updates are about personalisation. It has added suggestions from your own flickr feed, and from “my friends” (from facebook, twitter and mybloglog) so the recommendations you get are now more relevant to you. I’ve installed the update, and will be having a play with the new service at some point, but it doesn’t work on the Nodalities Blog, so I can’t show you any of its features at the moment.

It’s certainly interesting to see these applications being brought to a usable level, and it’s also cool to see the personal-focus being introduced. Andraz Tori, Zemanta’s CTO, wrote about semantic applications in Nodalities Magazine. There are things the Semantic Web community are working towards, but there are also applicable technologies being developed right now

The Web’s Rich Tapestry

We’ve all read books that linger in our memories. And there are any number of reasons why they might do so; a stirring tale or thought-provoking argument, for example. One book that has stayed with me over the years is The House of Leaves by Mark Danielweski. It’s been described as “the Blair Witch” of haunted house tales, being the story of a house, the people who live there, and those who attempt to document the strange events and structure of the building. The book is quite a challenging read as it is made up of overlapping narratives, documentary evidence from the investigators, etc. As a reader you’re assembling a narrative out of the interlocking pieces of text that the author presents you with.

But, while the tale is one of those slow burrning horror stories that does linger at the back of the mind, that’s not the primary reason why the book has stayed with me. It was the actual structure of the text that was so intriguing: the author has played with the printed form, including the basic layout of the print on the page in an attempt to further promote the mythology of the story and to help convey the labyrinthine nature of the house. For example a typical page might contain several different blocks of text, and much of the story is told through footnotes and footnotes to footnotes, and footnotes to those footnotes. Certain words are coloured differently throughout the text. There are even blocks of text embedded in the page which you have to read downwards through several pages before returning to your starting point. As a reader you’re physically exploring the text much like the characters are exploring the house.

The book is basically a hypertext novel and while certainly not the first to play with the printed form in this way, it was the first that I’d personally encountered. As a hypertext the book appeals to the technologist in me: I’ve given a number of talks over the past few years and in many of these I’ve explored the evolution of hypertext systems. But I’ve also attempted to challenge people’s pre-conceptions about the medium of the web, just as the House of Leaves challenged my conceptions about the printed medium.

My most recent talk was last week at the ALPSP Internationational Conference 2008 which took place last week in Old Windsor. The talk, titled “The Web’s Rich Tapestry“, discussed the link as the basic medium of the web and reviewed how the blurring of boundaries between websites, services and data (aka “Web 2.0″) is enabled by increasingly richer linking between resources. This is part of a move from old broadcast models of information publishing to a more web-like network of interconnected peers each contributing to a dense information medium. The ultimate endpoint of this inherent in the vision of the Semantic Web, and will complete the change from a document-centric to a data-centric world. The Semantic Web, which is just a layer on top of the existing web, is still based on linking. Albeit linking of a more fine-grained and meaningful nature.

The Semantic Web, just like the existing Web, will arrive through the actions of individuals, organizations and businesses, each contributing to the whole by sharing linked data sets; this process is already happening. And, like the Web, the more data is available, the more value there will be for everyone involved. I urged society publishers to begin more openly sharing their metadata and exploring the potential inherent in the Web of Data. I also attempted to do more than just evangelize the potential benefits of the Semantic Web and also tried to provide a few pointers towards where those benefits might be realized.

One obvious benefit relates to the generation of more traffic to content and services. For many publishers a sizeable, if not the majority, of their website traffic is driven by Google referrals. This is an inherently fragile situation, but one that I believe is ultimately temporary. The scale of this traffic generation is obviously due in major part to the popularity of the Google search engine, but it is enabled by their ability to quickly and efficiently crawl websites in order to index content. This provides a large “surface area” to which Google can generate links. By publishing open data, information providers will be able to grow this surface area by at least an order of magnitude due to the more fine-grained data publishing that the Semantic Web entails. All of this data can potential generate new, highly relevant traffic to content and services.

The other area that the Semantic Web will pay off is by enabling much more sophisticated research and analysis tools, not just for academic researchers and students, but also for all of us in our every day consumption of information. In my view there is too much of a focus on search and not enough on information visualisation and analysis tools. I pointed towards some very recent experiments which I think illustrate some of this potential, including Ubiquity and Freebase Parallax. Talis’s own Project Xiphos is also exploring the innovation that can follow from re-purposing publishing metadata, a topic that was particularly relevant to the ALPSP audience. In my new role as Programme Manager for the Talis Platform, I’m excited to begin exploring how we can start helping businesses to begin drawing value from the rapidly growing Web of Data.

WWWF

This week saw the launch of the World Wide Web Foundation with a speech by Sir Tim Berners Lee introducing what it will be doing, and what it’s for. The points put forward in the speech seemed centred around three bullets from Berners-Lee’s talk:

  • to advance One Web that is free and open,
  • to expand the Web’s capability and robustness,
  • and to extend the Web’s benefits to all people on the planet.

The Foundation’s strapline is “Humanity Connected”, and seems to be based out of concerns from the way the Web is now.
My first reaction to this was the moderately paradoxical: “Cool! Why?” And a long look at the Foundation’s FAQ’s have brought out more questions.

The Cool part comes—apart from the support from TBL—from the idea that an organisation bringing together top minds from web, business, public sector and educational spaces to focus on the connectivity of mankind could produce some of the most profound, interesting, and useful research ever entered into. One of the FAQ’s sums this up:

Why did this Foundation emerge from W3C and WSRI and why now?
Those organizations have contributed significantly to the vision of the Web as humanity connected, but still more is required to include more people, in particular in underserved communities. The Foundation seeks to extend the benefits of a Web, improved by further research and technology development to all people.

The “Why?” comes from the mission it will involve. Is this a venture in trying to control the web itself (is that ominous?)? While they’re looking into security and other areas of personal and corporate trouble with the Web, are they also going to be looking at ways to eliminate this themselves, or is this more of a “watchdog”?

As you can see, there are many, many questions raised by this new foundation launch, but as with anything involving the W3C I’m sure there will also be a lot of information coming our way soon!

Semantic Web and Startups

Following both DEMO and TC50 the past week, there has certainly been no shortage of startup energy about the blogosphere. TC50’s winner, Yammer, has made an appearance in just about every techblog, some with mixed reviews. Essentially a Twitter set up for the enterprise (it’s a locked-down version of micro-blogging) it feels, instinctively, like a step away from open, linked data and a vibrant web-space. I don’t know that much about it, however, so I’d be happy for someone to explain its benefits to the enterprise and the web itself.

Much more exciting to me, was the launch of fotonauts at TC50 (No, I wasn’t there, but there is a brilliant summary here). The premise is to manage photographs in a much more linked way. The service can take information from Google Maps, Wikipedia and other sources and mash them up with your photos—or, apparently, photos from other usable sources like flickr. These photos are then published with all the extracted metadata to a website without you having to do any of the scraping yourself. I’ve signed up for a beta, but it’s not open yet, so watch this space! (And, fotonauts, if you’re watching: “fill this space!”)

At Demofall, (again online, rather than physically at the conference), I noticed the appearance of Data Essence.  Data Essence, from what I’ve read, is looking to accurately match investors with potential opportunities through semantic web analysis of websites. Although they seem to be aiming in a Semantic Web direction, they’re not hanging about waiting for a revolution. One quote from Amnon Mishor—Data Essence CEO and co-founder—I thought particularly interesting was:

“Clever semantic tagging cannot stand for itself,” Misher said. “There is a need for sophisticated algorithms that utilize the semantic data to intelligently hand pick information tailored to match the users’ interests and profile. Without this ability, any natural language  solution or semantic database will become a not so useful utility, and will not really solve the problem of information overflow.”

hmmm… That sounds like a conversation in itself. What algorithms are necessary, and what applications do you see being/needing developing NEXT. What are the blockers? I’d be particularly interested to hear what other startups and news events you’ve seen lately. Also, what you think of what these Startups mean for Semantic Web technologist (both established and emerging).

Right, I’m running out of HotSpot credit, and I’m loath to pay a well-known mobile broadband supplier any more, so I’ll end there. Over-arching message of this post? Lots of startups, some of them catching media attention (and I don’t know why?), others seem to be sleeping potential giant-killers. Lots of “calls to arms”… what do you see?

Sean Martin talks with Talis about Cambridge Semantics

sean_100x100.shkl.jpgIn our latest podcast I talk to Sean Martin, President and CTO of Cambridge (MA)-based semantic technology startup Cambridge Semantics.

We discuss Sean’s background with IBM, before turning to consider the work he’s currently involved in; building a sustainable business.

 
 Standard Podcast [55:16m]: Play Now | Play in Popup | Download (156)
Creative Commons License

During the conversation, we refer to the following resources;

This conversation was conducted using Skype on Friday 5 September, recorded with Ecamm Network’s Call Recorder for Skype, and edited on a Mac with Garageband.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

This Week’s Semantic Web

Selected links related to Semantic Web technologies for the week ending 2008-09-08, all weeks. Also available in RDF as linked data or via GRDDL.

A day later than planned, and somewhat shorter than usual (blame Chrome and Ubiquity!), but hopefully there’ll be something to catch your eye.

In the Media

Docs

Software News

Events etc.

Miscellany

~

Sources include Planet RDF, various other blogs, Semantic Web Interest Group IRC Chatlogs & Scratchpad, ESW Wiki, SemWebCentral, Sweet Tools, W3C Semantic Web Activity, mailing lists, personal emails etc etc. If you see anything suitable this coming week, please mail me or use the del.icio.us tag “TWSW” - thanks!

Ubiquity

ubiquityA couple days ago, Michael Hausenblas suggested I look at something called Ubiquity, and sent me a link. Because it came in the middle of editing the current version of Nodalities Magazine, I did what I often do with interesting concepts: I opened a new firefox tab and left it there for two days, hoping I would notice it before firefox crashed with all tabs on board. Well, since then, there has certainly been a lot of discussion about Ubiquity—both around the office and on the web. To introduce it, I can’t do much worse than pointing at their video and introduction page, and just say that it’s a Mozilla labs project and a Firefox plug-in.

However, what makes it interesting to me, is that it possibly introduces a new metaphor for interacting with web content—and, vicariously, linked data. The thought process behind it is that whenever we want to “do something” online, we are generally forced into round-about processes. Say, for example, I want to email a friend to tell him about a new restaurant I went to, maybe even invite him to meet me there for lunch. To accomplish this task, I’d typically open up three or four tabs in Firefox, and maybe open ICal and Mail application windows too: I’d google the restaurant, find its phone number and address from yell.com; map its address using maps.google.com or similar; I’d check which date I’m free; and finally email him the info, copying and pasting links and map images between multiple tabs, and—if I’m not using gmail’s web interface—into other applications as well. If you followed that last sentence, you’re doing well: it’s long, and complex (technically complex-compound, but we won’t get pedantic here), and it reflects the process.

Although it’s a beta, and many of its functions are very much less-than-polished, it offers a glimpse of a possible interaction future, with drastically more simple processes to complete tasks. What it creates is the ability to interact with content more directly, so you can select some content and start telling the application to DO stuff TO the content, by typing. So, i can select a physical address and type: “map this” into Ubiquity, and it’ll pull up google maps for that address (at the moment, it’s having trouble with some UK addresses because it’s using google.com and therefore not contextualising through the .co.uk which works better for addresses here). I can then use that information on the same screen. I can “yell florists in birmingham” and have a list of flower vendors in Birmingham from Yell.com (yellow pages service), which I can then drop into an email or whatever.

Very quickly, I ran into a conceptual problem with Ubiquity’s idea of natural-language interaction, however. Their strapline is: “An experiment into connecting the Web with language.” The idea being that you can “tell” the computer to “do something with/to this information” or “command” for something to happen, changing the basic interaction metaphor from a visual click/drag/drop/open-window process to a linguistic “I’m telling the computer what I want and it happens” framework. My immediate reaction was: “This isn’t linguistic, it’s command-line”, and was instantly transported to trying to learn Linux without a technical background, with all the frustration of a non-technical user trying to interact with software using a command-line.

You see, from my perspective as a linguist, I often feel frustrated with the computing community’s view of what language actually is. Without exploring propositionality, conceptual metaphor framework or anything else, it’s sufficient to say that language is both simpler and more complex than anything we’ve got software to emulate yet. What Ubiquity actually is, is a very simplified command-line which is “aware” of the information you’re already interacting with. From that perspective, it seems to work very well, with a more streamlined set of commands and more “natural language” feel to the words you actually type.

The upshot of this is that users have to learn a set of commands to interact with their applications, but that these commands are intended to be transparent in meaning. So, you “map this” or “help” or “add 1PM lunch with Dave”. After reading some of the reasoning behind this from one of the designers, Aza Raskin, I started to appreciate it more and more. The current contrasting model to this “Linguistic Command Line” is menus and windows. Menus and windows are inefficient, if you think about it. You have to select text, or images or whatever and physically move your curser to a menu somewhere in the extreme side of a window on your screen, finding and selecting the command from a drop-down list from which you need to remember the path to each command. The problem accelerates when you incorporate windows and applications into this. So if I were to incorporate some text from one window into another, linking to the original, and maybe dropping in a customised image too; I’d have to open multiple windows, executing menu commands or application-specific keyboard “short-cuts” at each stage.

But, I already know what I want to do with the stuff, right? Why not just activate a single keyboard shortcut and begin typing your instructions to the system: send link to <email>. Ubiquity allows this. In this framework, Firefox becomes a bit of a microcosm of the operating system (with tabs being windows, and sites and web-apps being desktop applications). As you type, it short-lists commands, so you don’t even type the full thing: typing “t r a n” ends up with the translate, so you can skip it and begin typing “to eng”, and it will offer you “translate text to English”.

Now, imagine having this ability with any form of Linked Data? Imagine if that bit of text were automatically recognised as a date, or co-ordinates, or person. Imagine selecting a picture of a restaurant and typing: “invite fred for lunch at 3PM on monday, enter”. The system could automatically know that the picture was of a restaurant (whose profile could include co-ordinates, contact info, and even a hypothetical automatic table-reservation system for invites from the web), that fred is your colleague (whose FoaF profile includes email or instant messenger preferences),  that lunch is an email subject and a social event, and that 3PM on Monday is a date (in your calendar and in Fred’s calendar once the message is sent) which corresponds with your name + su. All of that information is being used in several processes (Copy/paste, lookup restaurant profile, map location, lookup email or IM, create iCal event, create email or IM message, send)  but all you’re really doing is : “inviting fred for lunch at 3PM on Monday.”

This is incredibly intriguing, because it begins to show how some systems can begin to scale up to the immensity of the Web. We, as people, know what we want to accomplish, and if we could just tell our computers that, we’d be much happier. I think this could be a first step, and while I’m not completely convinced with the command-line metaphor, I can see this as a definite step, and a different perspective. My new copy of  Aza’s father’s book the Humane Interface, arrived this morning to supplement this, and I’ll be blogging more about that, if it’s ever returned to my desk.

Person Michael Hausenblas

Right click for SmartMenu shortcuts