« February 2008 | Main | April 2008 »

31 March 2008

This Week's Semantic Web

Selected links related to Semantic Web technologies for the week ending 2008-03-31, all weeks. Also available in RDF as linked data or via GRDDL.

DataPortability features highly this week - though intended more as a principle, and an umbrella for all kinds of technologies, the significant overlap with the Semantic Web vision hasn't gone unnoticed.

Microsoft have been experimenting with a triplestore/rdb hybrid - "...they are coming around, albeit gradually :-)" says Kingsley.

There seems to be a lot happening around the various mailing lists at the moment. Despite the growth of other communication channels, this still seems to be a big one - it certainly remains the best option for permathreads and bickering. For comparison: Wiki collaboration leads to happiness (hmm, Word & Outlook are somewhat wild variables in that diagram).

In the Media

Docs

Software News

Most Productive SemWeb Coder of the Week

[a trial/one-off category, as suggested - feel free to make nominations for next week]

Events etc.

Miscellany

Quote of the Week

This is the year we finish the job, I reckon...

-danbri, via email

~

Sources include Planet RDF, various other blogs, Semantic Web Interest Group IRC Chatlogs & Scratchpad, ESW Wiki, SemWebCentral, Sweet Tools, W3C Semantic Web Activity, mailing lists, personal emails etc etc. If you see anything suitable this coming week, please mail meor use the del.icio.us tags "semweb weekly" - thanks!

Posted by Danny Ayers at 09:07 PM | Comments (0) | TrackBack

26 March 2008

Nitpicking Alex's Semantic Web Patterns

Alex Iskold just published quite a lengthy blog post called Semantic Web Patterns: A Guide to Semantic Technologies. Overall it's good stuff, and Alex has been doing a great job of promoting the Semantic Web over on Read/WriteWeb and elsewhere. He's also one of the Semantic Gang featuring in the latest podcast series from oor Paul. (I've not listened to that yet - I'll try it with a dogwalk shortly).

Because of all this I feel a little disloyal in being critical, but without clarification some of the points in Alex's post could lead to misconceptions, the bane of Semantic Web outreach. One thing I can't disagree with Alex about is the way the Semantic Web means different things to different people (cue elephant analogy). So with that proviso and all due respect etc, here we go:

1. Bottom-Up and Top-Down
Alex says:


"The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically."

Ok, while one could (and I will) quibble the content of these definitions, they do make a pretty clear distinction. The only thing is, the phrases "bottom-up"/"top-down" have already been used fairly extensively already in the Semantic Web context to describe at least two different (but related) distinctions.

The first of these is with regard to decision-making, in the same sense as within the management hierarchy of an organization. The naive stereotype for this distinction would give, say, top-down = "those in power in standards orgs call the shots" versus bottom-up = "grassroots developers determine the direction". Given that specifications can appear as authoritative rules, it's easy to see how this perception might emerge. (This is a naive distinction, because it fails to consider the influence of the community that goes into defining specifications and in determining which survive the natural selection of deployment in the wild).

The second usage of "bottom-up"/"top-down" is more technical, in regard to how you arrive at your world/domain model. Top-down would be starting your model from a generalized level and works towards more specific levels, bottom-up the reverse. Clearly if there's to be global interoperability, taking the top-down approach would imply there's one true model that everyone follows. In the past this has led to some awful misconceptions around RDF, where people have assumed that the models (i.e. vocabularies, RDF Schemas, ontologies) are created on high - probably by the W3C. Quite the opposite is true. While RDF is a framework (and hence might be viewed as a top-level language), it's essentially neutral on who, where and how domain models are created. Because things, classes of things, relationships between things and so on are identified using URIs, anyone can create their own vocabularies. This retains a base level of global interop, and enables web-scale independent development. (I once saw a list email containing a line like "the namespace begins with http://purl.org, so it must be something to do with RSS 1.0 people at the W3C" - no, no, no!).

So basically while Alex's "bottom-up"/"top-down" may be internally consistent, it's a little idiosyncratic.

2. Annotation Technologies: RDF, Microformats, and Meta Headers
There's quite a bit I could quibble with in this section, but I'll stick to the one point I think is most significant. It can be very misleading to think of RDF merely as an annotation and/or metadata tool. While it can be, and very often is, used for annotation (typically descriptions of documents) and metadata (descriptions of data) purposes, it is also used to talk about things directly. Alex provides an example: "Alex IS the father of Alice, Lilly, and Sofia". This is plain old data. The same data could be expressed in an database table called "fatherOf" with "Alex" appearing three times in the left-hand column with the right-hand column containing "Alice", "Lilly", "Sofia". RDF is a data technology, one big difference from traditional RDBMSs is that relations (tables, properties, "fatherOf") can only two values - the subject and object of the relation (2 columns, "fathers"/"children"). Another big difference is that both things and the relationships between things are generally identified using URIs, which enables the Web part of the Semantic Web.

3. Consumer and Enterprise
I think it's good that Alex highlights consumer/enterprise and vertical/horizontal aspects of the Semantic Web, they are worthy of discussion. But regarding the "killer app" of the Semantic Web - one might equally well ask "what is the killer app of the Web?" (this is Tim Berners-Lee's own response in the 2001 Sci Am article).

There's another source of misconceptions in this section: "RDF offers a way to communicate using XML-based language...". While strictly speaking that's probably correct, it gives the impression that RDF is XML-based, which it isn't. RDF is a data model, an abstract language. Formats and serializations (of which there are several, both XML and non-XML) are secondary. Given the recent work around GRDDL, it'd be more accurate to say "XML offers a way to communicate using RDF-based language...".

This confusion around XML messes up Alex's arguments on scalability somewhat - I'm sure someone somewhere is using an XML DB for RDF, but most I've seen are either built on top of RDBMSs or are RDF-native. (Non-generic, domain-specific data can be stored pretty much any way you like - if semweb interfaces were exposed I suppose you could call it an RDF store of sorts...). Also while RDF storage technology isn't any where near as mature as those of RDBMS, they do draw on essentially the same foundations - and sometimes the same people - so the picture isn't as bad as one might imagine. Genuinely large RDF stores are starting to appear, and even then it's worth remembering (as Alex points out) the aim is for the big database to be the Web itself. (My own standard line on this is that triplestores are just local caches of chunks of the Semantic Web).

4. Semantic APIs
As Paul Downey put it, Web APIs Are Just Web Sites - the same goes for the Semantic Web. Alex talks about some of the online APIs for extracting RDF from natural language. While these are nifty, potentially any Web site or service could with appropriate tweaking be a Semantic API. The original RSS was a Semantic API - descriptions of news-like items delivered using RDF over HTTP. While the latest syndication format, Atom, might not be RDF, it's good Web-friendly data that can be mapped to RDF (work is in progress on conventions for that).

Semantic Web technologies also have an ace card up their sleeves here, in the form of SPARQL. RDF stores and (with the appropriate wiring) any online RDF can be queried using a straightforward SQL-like language, operating over standard HTTP. A seriously powerful addition to the Web API toolkit.

Right now the ability to make mashups (client- or server-side) is limited by the effort needed to integrate across different APIs (the n-squared thing). RDF can make integration trivial. Even without RDF/SPARQL being available, a lot of the pain of integration can be alleviated if the data is mapped to RDF then integrated.

I don't think we'll ever see every single service offering Semantic Web-friendly APIs. But to the Web 2.0 style sites, the Web is a competitive environment. Services which do support RDF and/or SPARQL will be able to benefit from the lowering of the integration barrier, and over time increasingly tend to have a commercial advantage over services which don't. The ball is rolling and the field is wide open.

5. Search Technologies
"Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search." - er, well, no. Search, at least as we know and love it today, is an artifact of the document Web. Success for the Semantic Web wouldn't be improving search, but marginalizing it.

The information carried by the document Web, the stuff we're interested in, is generally expressed in human-readable text inside the documents. There's a semantic air gap between the protocols and languages of the current Web (HTTP, HTML...) and the information that's being conveyed. Search engines bridge that gap through the use of heuristics based around string matching on queries and indexed documents. Semantic Web technologies offer a couple of ways of minimizing the gap. Through the increased use of metadata, more explicit matching can be made. Before anyone throws the metacrap arguments at me, consider the improvements already brought by metadata-rich syndication feeds and folksonomy tagging.

The other way of reducing the gap that comes to mind is...not to create gaps in the first place. Take an online train timetable. Right now it'll likely be contained in a database somewhere, exposed through HTML with a form or two. To access the data we are at the mercy of whatever specific front-end the service provider has offered. To make a mashup with it we'd be making site-specific calls, at best through a RESTful API. But if the data was also available without the document Web-oriented intermediation, say as RDF/XML documents, or perhaps better still a SPARQL endpoint, mashups would be trivial.

Incidentally, I remember the train timetable scenario coming up on the microformats list a while back, at the time it seemed nonsensical to me to follow the suggestion over there of having e.g. one microformatted-HTML page for each record in the database. In retrospect I think that was potentially a very good solution - assuming the microformat followed best practices, using a profile etc, then this would be equivalent to publishing all the data as linked RDF. A GRDDL-aware consumer would in fact see it that way. The bonus advantage is having the (inherently in sync) HTML material available too.

Anyhow, back to search. The current Web does contain one notable kind of explicit, machine-readable semantics: the link. This page is related to that page. I don't think it's coincidence that the most successful search heuristic to date - Google's PageRank - is based on this data source.

My standard line on search is "search engines act as indexes of the Web, the Semantic Web is its own index", or more succinctly "the best way to find things is not to lose them in the first place".

6. Contextual Technologies
I don't really disagree with what Alex says in this section, but would add that Semantic Web languages make it much easier to deal with contexts - which can be expressed directly, without the need for interpreting natural language. There are already a few pretty neat faceted browsing tools around, I reckon these things are going to get a lot neater over the next few years.

7. Semantic Databases
See above about triplestores in Consumer and Enterprise.

Twine and Freebase are really nice applications, although I believe Freebase's connection to the rest of the (Semantic) Web is still pretty suboptimal. Twine's still in beta, but has already come an awful long way (I put it in my open-in-tabs-regularly bookmarks). What they both demonstrate is that something which looks to the end user like a regular shiny Web 2.0 application can be built at a significant scale using RDF/RDF-like technologies. Where these things have an opportunity to get much more interesting than similar traditional products is in exploiting the Semantic Web angle. I do hope they hook up to the Linking Open Data cloud soon.

Conclusion
The Semantic Web does mean different things to different people, and maybe I'm being overly orthodox in seeing RDF+HTTP as the distinguishing features of these particular Semantic Technologies. But I'm glad I got that off my chest. Now for that dogwalk with Semantic Gang.

Posted by Danny Ayers at 12:13 PM | Comments (2) | TrackBack

25 March 2008

Introducing the Semantic Web Gang

Today we're launching a further series of podcasts to add to the range currently available to you. This new Semantic Web Gang will be a regular monthly show, tapping into insights on the news of the moment from some of those at the forefront in bringing the Semantic Web vision to reality.

Gang members for the first show were;

We shall be adding to the Gang in the coming months, as well as introducing the occasional special guest from time to time.

Listen Now

Download MP3 [60 mins, 55Mb]

See this post on ZDNet's blog, The Semantic Web, for more discussion.

During the conversation, we refer to the following resources;

This conversation was conducted on 20 March 2008.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , ,

Posted by Paul Miller at 09:40 PM | Comments (5) | TrackBack

24 March 2008

This Week's Semantic Web

Selected links related to Semantic Web technologies for the week ending 2008-01-07, all weeks. Also available in RDF as linked data or via GRDDL.

No obvious themes this week, but still lots of activity in diverse areas. So instead of introductory blurb, here's a seasonal picture:

Duck-Rabbit_illusion_smaller.jpg

"If it walks like a duck and quacks like a duck, I would call it a duck" - but what if it also hops like a bunny?

(source - public domain)

In the Media

Docs

Software News

Events etc.

Miscellany

Quote of the Week

Hi, I’m Web Developer Barbie. Pull my string and I say, “Standards are tough! Let’s go shopping!”

-Mark Pilgrim translates Joel Spolsky

~

Sources include Planet RDF, various other blogs, Semantic Web Interest Group IRC Chatlogs & Scratchpad, ESW Wiki, SemWebCentral, Sweet Tools, W3C Semantic Web Activity, mailing lists, personal emails etc etc. If you see anything suitable this coming week, please mail meor use the del.icio.us tags "semweb weekly" - thanks!

Posted by Danny Ayers at 06:40 PM | Comments (0) | TrackBack

20 March 2008

Jim Hendler talks about the Semantic Web and Artificial Intelligence

Jim Hendler
In our latest podcast I talk with Jim Hendler, Tetherless World Senior Constellation Professor at Rensselaer Polytechnic Institute in Troy, New York. We discuss Jim's early experience in Artificial Intelligence (AI) research, before digging into some of his observations on competing interpretations of the Semantic Web and exploring the relevance of Semantic Web ideas to users of today's Web 2.0 applications.

Listen Now

Download MP3 [50 mins, 24Mb]

See this post on ZDNet's blog, The Semantic Web, for more discussion.

During the conversation, we refer to the following resources;

This conversation was conducted using Skype on Friday 14 March, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , , , , , ,

Posted by Paul Miller at 09:31 AM | Comments (0) | TrackBack

17 March 2008

This Week's Semantic Web

Selected links related to Semantic Web technologies for the week ending 2008-03-17, all weeks. Also available in RDF as linked data or via GRDDL.

Big news this week regarding Yahoo! and the Semantic Web - an update about the Yahoo! Search open platform describes their adoption of significant support for Semantic Web technologies : RDF (with several key vocabularies), microformats, RDFa and eRDF. While it's been no secret that Yahoo! has been quietly developing with RDF for a while, the surprise here is the level of integration with their most visible application, search.

Speaking of RDFa, "microformats all grown up", its momentum continues to grow, and the announcements of a new RDFa Wiki and mailing list for developers and publishers are well timed to catch this wave.

While there's no major announcement right now from the Protocol for Web Description Resources (POWDER) W3C WG, they deserve a special mention due to their maintenance of a regularly-updated blog of meetings, issues and decisions that appear - openness beyond the call of duty!

btw, I got a Twine beta invite this week, and several of the links below came from there - thanks Nova!

In the Media

For more semweb-related podcasts see talk.talis.com.

Docs

Software News

Events etc.

Miscellany

~

Sources include Planet RDF, various other blogs, Semantic Web Interest Group IRC Chatlogs & Scratchpad, ESW Wiki, SemWebCentral, Sweet Tools, W3C Semantic Web Activity, mailing lists, personal emails etc etc. If you see anything suitable this coming week, please mail meor use the del.icio.us tags "semweb weekly" - thanks!

Posted by Danny Ayers at 07:37 PM | Comments (0) | TrackBack

16 March 2008

Eric Miller talks about Zepheira and semantically enabling systems for the Semantic Technology conference

Eric Miller

In our latest podcast I talk with Eric Miller, President of Zepheira. In this follow-up to our original podcast last year, we discuss a project Zepheira has been undertaking to simplify conference management and enrich the delegate experience at this year's Semantic Technology conference. The systems they have developed demonstrate some of the ways in which semantic web technologies can be integrated with existing processes in order to deliver increased value and functionality.

Listen Now

Download MP3 [39 mins, 19Mb]

During the conversation, we refer to the following resources;

This conversation was conducted using Skype on Friday 14 March, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , ,

Posted by Paul Miller at 12:40 PM | Comments (1) | TrackBack

Norman Gray Talks with Talis about Astronomy and the Semantic Web

Norman-Gray 100X150.Shkl
In our latest Talking with Talis podcast I talk with Norman Gray of the European Virtual Observatory's Technology Centre (Euro-VOTech). We talk about Astronomy, and some of the ways in which Semantic Web ideas and methods are beginning to play a role.

Listen Now

Download MP3 [67 mins, 32Mb]

During the conversation, we refer to the following resources;

This conversation was conducted using iChat on Monday 10 March, recorded with Ecamm Network's Conference Recorder, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , ,

Posted by Paul Miller at 11:45 AM | Comments (0) | TrackBack

13 March 2008

A Chat with Richard Cyganiak

Latest recording on technical matters is a chat with Richard Cyganiak, who's currently working on the Sindice Semantic Web search engine, though is probably best known for his leading role in the Linking Open Data project (maintaining the cloud diagram :-)

In the podcast Richard describes various technical details of these projects, and talks about the nature of data on the Web in the wild, as RDF, microformats and increasingly RDFa. He also discusses some of the practical issues in mapping existing databases to the Semantic Web (the kind of techniques Tim Berners-Lee mentioned in his podcast with Paul a few weeks ago).

Richard naturally mentions the principles of Linked Data :

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

Listen Now

Download MP3 [47 mins, 44Mb]

Posted by Danny Ayers at 06:03 PM | Comments (1) | TrackBack

12 March 2008

Semantic Technology Conference, San Jose

It looks like quite a few of us from Talis will be making the trip over to San Jose in May, for this year's Semantic Technology Conference.

Our CEO, Dave Errington, is on a panel of senior executives with Radar Networks' Nova Spivack and others. I look forward to seeing past the usual vapourware demonstrations to actually hear what these CEO's think, what makes them tick, and where they think this sector is headed.

Our CTO, Ian Davis, will be sharing some of our internal rationale in a paper on the 'Semantic Web as a Blue Ocean Opportunity' (if you don't get the reference, read the book), and he and Danny Ayers will also be offering a half-day workshop for those who want to get hands on with the Talis Platform.

I hope to be capturing proceedings, and sharing my impressions here or on ZDNet as appropriate. I shall also be securing a number of podcast interviews with some of the more interesting speakers in the run up to the event itself.

For those who plan to attend, why not check out last month's Talis Platform News for a discount code that you can use when registering for the conference? And while you're at it, get in touch if you fancy meeting up.

Technorati Tags: , ,

Posted by Paul Miller at 01:48 PM | Comments (0) | TrackBack

11 March 2008

Barak Pridor Talks about ClearForest, Calais, Reuters and the Semantic Web

Barak Pridor
In our latest podcast I talk with ClearForest CEO, Barak Pridor. We discuss the changing business model in Reuters' core markets, and consider ways in which their acquisition of ClearForest in 2007 helps Reuters position for the future. We also consider the technical and business decisions behind Reuters' recent announcement of an open API for the new Calais Web Service, built upon ClearForest's technology.

For more analysis of the conversation, see ZDNet's blog, The Semantic Web.

Listen Now

Download MP3 [35 mins, 16Mb]

During the conversation, we refer to the following resources;

This conversation was conducted using Skype on Thursday 21 February, recorded with Ecamm Network's Call Recorder for Skype, and edited on a Mac with Garageband.

For further Talking with Talis podcasts on the emerging Web of Data, see here.

Technorati Tags: , , , , , ,

Posted by Paul Miller at 03:53 PM | Comments (0) | TrackBack

Drupal calling Semantic Web..!

Arto Bendiken just posted a really useful mail to the Semantic Web Education and Outreach group giving some background on RDF developments around Drupal, as well as a list of possible ways SWEO could help. The list makes interesting reading for anyone looking to evangelize to developers, here's a minimal summary:

  1. RDF myths debunked - Arto mentions the legacy of early RDF/XML experience, suggests promoting Turtle and RDFa "the ultimate microformat" (the SW FAQ may help here)
  2. External validation - convincing the Drupal community-at-large that Drupal 7.0 adopting RDF wouldn't be taking place in a vacuum (nice hat-tip to Talis, thanks!)
  3. Endorsement and adoption - "Tim BL blogs using Drupal"
  4. Mentorship and participation - input from Semantic Web folks into the Drupal community
  5. RDF Schema for Drupal - immediate action item that could benefit from the RDF expertise

(The recommended tutorials material quoted in the mail, on the ESW Wiki and on Engage is on my to-do list, I hope to get back to that this week).

Posted by Danny Ayers at 10:12 AM | Comments (1) | TrackBack

10 March 2008

This Week's Semantic Web

Selected links related to Semantic Web technologies for the week ending 2008-03-10, all weeks. Also available in RDF as linked data or via GRDDL.

Big news this week was Drupal and the opportunity of RDF - the popular content management system looking seriously toward the Semantic Web.

PR-watchers might note the announcement of the Google Contact Data API arrived in the same few days as Why data matters and How Google keeps your information secure.

[Running late - only minimal notes for now]

In the Media

Docs

Software News

Events etc.

Miscellany

Quote of the Week

For the record, my site is valid HTML 5, except the parts that aren’t. My therapist says I shouldn’t rely so much on external validation.

-Mark Pilgrim

~

Sources include Planet RDF, various other blogs, Semantic Web Interest Group IRC Chatlogs & Scratchpad, ESW Wiki, SemWebCentral, Sweet Tools, W3C Semantic Web Activity, mailing lists, personal emails etc etc. If you see anything suitable this coming week, please mail meor use the del.icio.us tags "semweb weekly" - thanks!

Posted by Danny Ayers at 09:15 PM | Comments (2) | TrackBack

8 March 2008

A Chat with Tom Morris

Today's verbal delight features Semantic Web hacker (and philosopher) Tom Morris, initially talking about using XML to describe real-world things, mentioning the advantages of RDF. He then describes his experiences with the Ruby programming language, and offers thoughts on practical aspects of working in the distributed environment of the Web. Tom tells of ideas he has around using Bluetooth with RDF, before giving his opinion of platforms like Facebook, and related novel aspects of online gaming. He concludes by talking about his recent experience of organizing SemanticCamp London, and encouraging other people to try the BarCamp approach to conferences.

Listen Now

Download MP3 [52 mins, 48Mb]

During the conversation, we refer to the following resources:

Posted by Danny Ayers at 12:55 PM | Comments (1) | TrackBack

5 March 2008

The Best is Yet to Come

The next generation of the web, this Semantic Web, is in its infancy but already we're seeing some fantastic glimpses of its potential.

We saw some of that potential recently at DrupalCon 2008 where Dries Buytaert used his keynote to share a vision of the future... one that is built on RDF (read more on our sister blog).

Imagine every Drupal installation as a Linked Data source. Wow!

This would be a massive step towards the Semantic Web's maturation and I hope the Drupal people can pull it off. My advice would be to remember that these are still early days and to tackle it with pragmatic baby steps. Just like the early days of the Web there'll be plenty of stop energy trying to drag you back, but hold your nerve and see it through.

Adoption of the technologies by significant projects like Drupal really shows that we're entering a new generation