Nitpicking Alex’s Semantic Web Patterns
Alex Iskold just published quite a lengthy blog post called Semantic Web Patterns: A Guide to Semantic Technologies. Overall it’s good stuff, and Alex has been doing a great job of promoting the Semantic Web over on Read/WriteWeb and elsewhere. He’s also one of the Semantic Gang featuring in the latest podcast series from oor Paul. (I’ve not listened to that yet - I’ll try it with a dogwalk shortly).
Because of all this I feel a little disloyal in being critical, but without clarification some of the points in Alex’s post could lead to misconceptions, the bane of Semantic Web outreach. One thing I can’t disagree with Alex about is the way the Semantic Web means different things to different people (cue elephant analogy). So with that proviso and all due respect etc, here we go:
1. Bottom-Up and Top-Down
Alex says:
“The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically.”
Ok, while one could (and I will) quibble the content of these definitions, they do make a pretty clear distinction. The only thing is, the phrases “bottom-up”/”top-down” have already been used fairly extensively already in the Semantic Web context to describe at least two different (but related) distinctions.
The first of these is with regard to decision-making, in the same sense as within the management hierarchy of an organization. The naive stereotype for this distinction would give, say, top-down = “those in power in standards orgs call the shots” versus bottom-up = “grassroots developers determine the direction”. Given that specifications can appear as authoritative rules, it’s easy to see how this perception might emerge. (This is a naive distinction, because it fails to consider the influence of the community that goes into defining specifications and in determining which survive the natural selection of deployment in the wild).
The second usage of “bottom-up”/”top-down” is more technical, in regard to how you arrive at your world/domain model. Top-down would be starting your model from a generalized level and works towards more specific levels, bottom-up the reverse. Clearly if there’s to be global interoperability, taking the top-down approach would imply there’s one true model that everyone follows. In the past this has led to some awful misconceptions around RDF, where people have assumed that the models (i.e. vocabularies, RDF Schemas, ontologies) are created on high - probably by the W3C. Quite the opposite is true. While RDF is a framework (and hence might be viewed as a top-level language), it’s essentially neutral on who, where and how domain models are created. Because things, classes of things, relationships between things and so on are identified using URIs, anyone can create their own vocabularies. This retains a base level of global interop, and enables web-scale independent development. (I once saw a list email containing a line like “the namespace begins with http://purl.org, so it must be something to do with RSS 1.0 people at the W3C” - no, no, no!).
So basically while Alex’s “bottom-up”/”top-down” may be internally consistent, it’s a little idiosyncratic.
2. Annotation Technologies: RDF, Microformats, and Meta Headers
There’s quite a bit I could quibble with in this section, but I’ll stick to the one point I think is most significant. It can be very misleading to think of RDF merely as an annotation and/or metadata tool. While it can be, and very often is, used for annotation (typically descriptions of documents) and metadata (descriptions of data) purposes, it is also used to talk about things directly. Alex provides an example: “Alex IS the father of Alice, Lilly, and Sofia”. This is plain old data. The same data could be expressed in an database table called “fatherOf” with “Alex” appearing three times in the left-hand column with the right-hand column containing “Alice”, “Lilly”, “Sofia”. RDF is a data technology, one big difference from traditional RDBMSs is that relations (tables, properties, “fatherOf”) can only two values - the subject and object of the relation (2 columns, “fathers”/”children”). Another big difference is that both things and the relationships between things are generally identified using URIs, which enables the Web part of the Semantic Web.
3. Consumer and Enterprise
I think it’s good that Alex highlights consumer/enterprise and vertical/horizontal aspects of the Semantic Web, they are worthy of discussion. But regarding the “killer app” of the Semantic Web - one might equally well ask “what is the killer app of the Web?” (this is Tim Berners-Lee’s own response in the 2001 Sci Am article).
There’s another source of misconceptions in this section: “RDF offers a way to communicate using XML-based language…”. While strictly speaking that’s probably correct, it gives the impression that RDF is XML-based, which it isn’t. RDF is a data model, an abstract language. Formats and serializations (of which there are several, both XML and non-XML) are secondary. Given the recent work around GRDDL, it’d be more accurate to say “XML offers a way to communicate using RDF-based language…”.
This confusion around XML messes up Alex’s arguments on scalability somewhat - I’m sure someone somewhere is using an XML DB for RDF, but most I’ve seen are either built on top of RDBMSs or are RDF-native. (Non-generic, domain-specific data can be stored pretty much any way you like - if semweb interfaces were exposed I suppose you could call it an RDF store of sorts…). Also while RDF storage technology isn’t any where near as mature as those of RDBMS, they do draw on essentially the same foundations - and sometimes the same people - so the picture isn’t as bad as one might imagine. Genuinely large RDF stores are starting to appear, and even then it’s worth remembering (as Alex points out) the aim is for the big database to be the Web itself. (My own standard line on this is that triplestores are just local caches of chunks of the Semantic Web).
4. Semantic APIs
As Paul Downey put it, Web APIs Are Just Web Sites - the same goes for the Semantic Web. Alex talks about some of the online APIs for extracting RDF from natural language. While these are nifty, potentially any Web site or service could with appropriate tweaking be a Semantic API. The original RSS was a Semantic API - descriptions of news-like items delivered using RDF over HTTP. While the latest syndication format, Atom, might not be RDF, it’s good Web-friendly data that can be mapped to RDF (work is in progress on conventions for that).
Semantic Web technologies also have an ace card up their sleeves here, in the form of SPARQL. RDF stores and (with the appropriate wiring) any online RDF can be queried using a straightforward SQL-like language, operating over standard HTTP. A seriously powerful addition to the Web API toolkit.
Right now the ability to make mashups (client- or server-side) is limited by the effort needed to integrate across different APIs (the n-squared thing). RDF can make integration trivial. Even without RDF/SPARQL being available, a lot of the pain of integration can be alleviated if the data is mapped to RDF then integrated.
I don’t think we’ll ever see every single service offering Semantic Web-friendly APIs. But to the Web 2.0 style sites, the Web is a competitive environment. Services which do support RDF and/or SPARQL will be able to benefit from the lowering of the integration barrier, and over time increasingly tend to have a commercial advantage over services which don’t. The ball is rolling and the field is wide open.
5. Search Technologies
“Perhaps the first significant blow to the Semantic Web has been the inability thus far to improve search.” - er, well, no. Search, at least as we know and love it today, is an artifact of the document Web. Success for the Semantic Web wouldn’t be improving search, but marginalizing it.
The information carried by the document Web, the stuff we’re interested in, is generally expressed in human-readable text inside the documents. There’s a semantic air gap between the protocols and languages of the current Web (HTTP, HTML…) and the information that’s being conveyed. Search engines bridge that gap through the use of heuristics based around string matching on queries and indexed documents. Semantic Web technologies offer a couple of ways of minimizing the gap. Through the increased use of metadata, more explicit matching can be made. Before anyone throws the metacrap arguments at me, consider the improvements already brought by metadata-rich syndication feeds and folksonomy tagging.
The other way of reducing the gap that comes to mind is…not to create gaps in the first place. Take an online train timetable. Right now it’ll likely be contained in a database somewhere, exposed through HTML with a form or two. To access the data we are at the mercy of whatever specific front-end the service provider has offered. To make a mashup with it we’d be making site-specific calls, at best through a RESTful API. But if the data was also available without the document Web-oriented intermediation, say as RDF/XML documents, or perhaps better still a SPARQL endpoint, mashups would be trivial.
Incidentally, I remember the train timetable scenario coming up on the microformats list a while back, at the time it seemed nonsensical to me to follow the suggestion over there of having e.g. one microformatted-HTML page for each record in the database. In retrospect I think that was potentially a very good solution - assuming the microformat followed best practices, using a profile etc, then this would be equivalent to publishing all the data as linked RDF. A GRDDL-aware consumer would in fact see it that way. The bonus advantage is having the (inherently in sync) HTML material available too.
Anyhow, back to search. The current Web does contain one notable kind of explicit, machine-readable semantics: the link. This page is related to that page. I don’t think it’s coincidence that the most successful search heuristic to date - Google’s PageRank - is based on this data source.
My standard line on search is “search engines act as indexes of the Web, the Semantic Web is its own index”, or more succinctly “the best way to find things is not to lose them in the first place”.
6. Contextual Technologies
I don’t really disagree with what Alex says in this section, but would add that Semantic Web languages make it much easier to deal with contexts - which can be expressed directly, without the need for interpreting natural language. There are already a few pretty neat faceted browsing tools around, I reckon these things are going to get a lot neater over the next few years.
7. Semantic Databases
See above about triplestores in Consumer and Enterprise.
Twine and Freebase are really nice applications, although I believe Freebase’s connection to the rest of the (Semantic) Web is still pretty suboptimal. Twine’s still in beta, but has already come an awful long way (I put it in my open-in-tabs-regularly bookmarks). What they both demonstrate is that something which looks to the end user like a regular shiny Web 2.0 application can be built at a significant scale using RDF/RDF-like technologies. Where these things have an opportunity to get much more interesting than similar traditional products is in exploiting the Semantic Web angle. I do hope they hook up to the Linking Open Data cloud soon.
Conclusion
The Semantic Web does mean different things to different people, and maybe I’m being overly orthodox in seeing RDF+HTTP as the distinguishing features of these particular Semantic Technologies. But I’m glad I got that off my chest. Now for that dogwalk with Semantic Gang.














March 26th, 2008 at 2:37 pm
Danny,
Thanks, I was hoping to see a knowledgeable voice comment on this stuff. It seems to me we are at a dangerous juncture where much misinformation can be propagated exactly at the time when broader public attention is coming to the semantic Web.
I think you may have a hard time restraining yourself, as well, after your dog walk (and all the cats, too, while you cool off
).
March 27th, 2008 at 12:23 am
Semantic Web Patterns: A Guide to Semantic Technologies
ReadWriteWeb via via <a href=”http://alexiskold.wordpress.co
March 27th, 2008 at 4:49 pm
As I mentioned somewhere (FriendFeed I think). This was quite a good and rather polite beatdown. Should make for an interesting SemWeb Gang #2