PS. John has just posted a *video* : DataPortability and me, JB
John Breslin has just posted Semantic Web for Dummies, suggesting (after Stefan Marti):
- XML customised tags, like:
- <dog>Nena</dog>
- + RDF relations, in triples, like:
- (Nena) (is_dog_of) (Kimiko/Stefan)
- + Ontologies / hierarchies of concepts, like:
- mammal -> canine -> Cotton de Tulear -> Nena
- + Inference rules like:
- If (person) (owns) (dog), then (person) (cares_for) (dog)
- = Semantic Web!
I have a bone to pick with this. While it’s a very nice summary of theSemantic, where’s the Web?
It would be seriously unfair of me to pick on John too much, especially since the list above probably still is the world view shared by most in the Semantic Web community (and I admit that’s how I saw things myself, until relatively recently). It also corresponds to the traditional layer cake representation of the Semantic Web stack of technologies. I should also mention that John’s been a driving force behind the SIOC (Semantically-Interlinked Online Communities) project, which is seriously Web, and is addressing an area now very much in focus in the Web community at large, social data. (I’m also really pleased John was able to attend the DataPortability telecon, that initiative really needs semweb folks to explain the work that’s already been done in the field…and I’m afraid a 6am start is simply beyond me).
It’s often stated that the Semantic Web is an extension of the existing Web. What isn’t always clear is the correspondence between the two. The
current Web is built of documents and hyperlinks between them. If we generalize ‘documents’ to ‘things’ and ‘hyperlinks’ to ‘relationships’, we get Tim Berners-Lee’s Giant Global Graph (I went into a bit more detail on this in Evolving the Link). This abstract model has a lot in common with various conventional ideas from software architecture: object-orientation, entity-relationship modeling and even the relational model behind most databases. There are plenty of differences in detail between these models, but the biggest difference of all in the Semantic Web perspective is that the model is overlaid onto the Web.
So as a first pass at bringing the Web back into the picture, try the following as a Semantic Web 101:
- A uniform naming scheme for every kind of thing: documents, people, real-world objects, concepts etc.
- A data model which allows you to express relationships between named things
- Formats and other data structures which allow you to express information in this data model
- A protocol which enables related data to be discovered
- User tools which support the above
#1 is Uniform Resource Identifiers, the most significant subset of which is HTTP URLs of the Web.
#2 is RDF, as necessary augmented with ontological and/or rule-oriented techniques.
#3 is pretty much anything in which data can be expressed: obviously RDF formats like RDF/XML and Turtle, but also HTML through the use of RDFa,microformats and Embedded RDF; virtually any XML can be transparently interpreted as RDF through GRDDL; custom translators are available for formats like iCalendar or even CSV data; mapping tools are available for relational databases and systems like LDAP. Basically it usually isn’t necessary to rewrite any application to take advantage of Semantic Web techniques.
#4 is the HTTP protocol, and what Tim Berners-Lee has called “the basic follow-your-nose way the Web works“.
#5 is a side that’s taken a back seat while developer tools like APIs have been in development. Existing applications can usually be made Semantic Web-aware, but there’s a whole lot more can be done in this area in regards to tools for manipulating generic data, and the development of new applications that would be difficult or even impossible without the (Semantic) Web and its technologies to draw upon.
I think it would be fair to say that Semantic Web evangelism has had its share of wrong turns. Way too much time has been spent in arguments over data formats, and the relative complexity of the layers further up the stack have no doubt caused many to reject the technologies after a cursory review. While the stack does has an appealing consistency, there’s little obvious relevance for regular developers. When thinking in terms of ontologies and so on, it’s hard not to slip into designing things top-down and schema-first, which is pretty well the opposite of what is emerging as a more effective approach. The RDF model makes it straightforward to design systems data-first, and when working with an existing, deployed Web, this has definite advantages in terms of allowing incremental development and all-round flexibility.
Anyhow, the Web has been rediscovered in this context – the realisation that what we’re talking about, first and foremost, is Linked Data. Whether that data is concerned with the documents of the traditional Web, the people of social networks or whichever aspect of the world drifts into the limelight next, the same standard technologies can be used to look after it.
Come to think of it, all the above is effectively summarised in Tim Berners-Lee’s Linked Data rules:
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names.
- When someone looks up a URI, provide useful information.
- Include links to other URIs. so that they can discover more things.