Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Archive for the 'Tech Talk' Category

A Chat with Uldis Bojars

On Friday I had the pleasure of a chat with Uldis Bojars (also known as CaptSolo), who’s recently being developing social network-oriented applications using Semantic Web technologies. His main area of work over the past few years has been the SIOC project (Semantically-Interlinked
Online Communities), and in the podcast he discusses how the SIOC project anticipated (and fulfils many of the requirements of) the DataPortability initiative. As you can see from the list of links below, there’s a lot happening in this space.

Uldis used two phrases I don’t recall hearing before, but expect to hear a lot more in future. When discussing the recently updated Semantic Radar Firefox plugin he described how it passes data along the “Semantic Web food chain“, and then regarding SIOC Explorer, mining social networks based on “object-centered sociality” (this is described in the paper). Good stuff.

Listen Now

Download MP3
[49 mins, 46Mb]

During the conversation, we refer to the following resources:

Semantic Web…in a nutshell?

PS. John has just posted a *video* : DataPortability and me, JB

John Breslin has just posted Semantic Web for Dummies, suggesting (after Stefan Marti):

XML customised tags, like:
<dog>Nena</dog>
+ RDF relations, in triples, like:
(Nena) (is_dog_of) (Kimiko/Stefan)
+ Ontologies / hierarchies of concepts, like:
mammal -> canine -> Cotton de Tulear -> Nena
+ Inference rules like:
If (person) (owns) (dog), then (person) (cares_for) (dog)
= Semantic Web!

I have a bone to pick with this. While it’s a very nice summary of theSemantic, where’s the Web?

It would be seriously unfair of me to pick on John too much, especially since the list above probably still is the world view shared by most in the Semantic Web community (and I admit that’s how I saw things myself, until relatively recently). It also corresponds to the traditional layer cake representation of the Semantic Web stack of technologies. I should also mention that John’s been a driving force behind the SIOC (Semantically-Interlinked Online Communities) project, which is seriously Web, and is addressing an area now very much in focus in the Web community at large, social data. (I’m also really pleased John was able to attend the DataPortability telecon, that initiative really needs semweb folks to explain the work that’s already been done in the field…and I’m afraid a 6am start is simply beyond me).

It’s often stated that the Semantic Web is an extension of the existing Web. What isn’t always clear is the correspondence between the two. The
current Web is built of documents and hyperlinks between them. If we generalize ‘documents’ to ‘things’ and ‘hyperlinks’ to ‘relationships’, we get Tim Berners-Lee’s Giant Global Graph (I went into a bit more detail on this in Evolving the Link). This abstract model has a lot in common with various conventional ideas from software architecture: object-orientation, entity-relationship modeling and even the relational model behind most databases. There are plenty of differences in detail between these models, but the biggest difference of all in the Semantic Web perspective is that the model is overlaid onto the Web.

So as a first pass at bringing the Web back into the picture, try the following as a Semantic Web 101:

  1. A uniform naming scheme for every kind of thing: documents, people, real-world objects, concepts etc.
  2. A data model which allows you to express relationships between named things
  3. Formats and other data structures which allow you to express information in this data model
  4. A protocol which enables related data to be discovered
  5. User tools which support the above

#1 is Uniform Resource Identifiers, the most significant subset of which is HTTP URLs of the Web.

#2 is RDF, as necessary augmented with ontological and/or rule-oriented techniques.

#3 is pretty much anything in which data can be expressed: obviously RDF formats like RDF/XML and Turtle, but also HTML through the use of RDFa,microformats and Embedded RDF; virtually any XML can be transparently interpreted as RDF through GRDDL; custom translators are available for formats like iCalendar or even CSV data; mapping tools are available for relational databases and systems like LDAP. Basically it usually isn’t necessary to rewrite any application to take advantage of Semantic Web techniques.

#4 is the HTTP protocol, and what Tim Berners-Lee has called “the basic follow-your-nose way the Web works“.

#5 is a side that’s taken a back seat while developer tools like APIs have been in development. Existing applications can usually be made Semantic Web-aware, but there’s a whole lot more can be done in this area in regards to tools for manipulating generic data, and the development of new applications that would be difficult or even impossible without the (Semantic) Web and its technologies to draw upon.

I think it would be fair to say that Semantic Web evangelism has had its share of wrong turns. Way too much time has been spent in arguments over data formats, and the relative complexity of the layers further up the stack have no doubt caused many to reject the technologies after a cursory review. While the stack does has an appealing consistency, there’s little obvious relevance for regular developers. When thinking in terms of ontologies and so on, it’s hard not to slip into designing things top-down and schema-first, which is pretty well the opposite of what is emerging as a more effective approach. The RDF model makes it straightforward to design systems data-first, and when working with an existing, deployed Web, this has definite advantages in terms of allowing incremental development and all-round flexibility.

Anyhow, the Web has been rediscovered in this context – the realisation that what we’re talking about, first and foremost, is Linked Data. Whether that data is concerned with the documents of the traditional Web, the people of social networks or whichever aspect of the world drifts into the limelight next, the same standard technologies can be used to look after it.

Come to think of it, all the above is effectively summarised in Tim Berners-Lee’s Linked Data rules:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information.
  4. Include links to other URIs. so that they can discover more things.

A Chat with Benjamin Nowack

Just before the weekend I (Danny Ayers) had the pleasure of a telephone chat with Benjamin Nowack, in which he described his views on rapid Semantic Web development in PHP, providing a Unique Selling Proposition to web design agencies and his ARC RDF Classes for PHP among a few other things – see the list below. Particularly timely was his reference to using an ARC-based scutter (RDF crawler) with WordPress for gathering blog-commentator’s social graphs.

Here’s the recording (mp3, 37 minutes, 34MB).

Notes:

Nodalities now on the Semantic Web


GRDDL

Last week Masahide Kanzaki kindly provided the code to allow This Week’s Semantic Web to be interpreted as Semantic Web data. The code was XSLT, which is supported by GRDDL, the new W3C specification for Gleaning Resource Descriptions from Dialects of Languages. Following a minor addition to this blog systems template, the posts here can now be automatically intepreted as RDF by GRDDL-aware agents. These currently include Tabulator and the OpenLink RDF Browser (here showing the discovery of 556 triples from a single blog post).

As an experiment, I’ve also created a GRDDL profile which provides RDF statements for every link in a HTML document. When defined as a profile, only a very small addition is needed to a HTML document (which should be valid XHTML) for it to be GRDDLable. The addition for this particular profile looks like this:


...
<head profile="http://purl.org/stuff/glink/">
...

Now all the posts here expose explicit data on the Semantic Web.

Other profiles and transformations (several corresponding to popular microformats) can be found at CustomRdfDialects on the ESW Wiki.

Future of Web Apps (FoWA)

I’ve just got back to work after two great days down at “Future of Web Apps” (FoWA) in London. This year was heavily focussed around social networking sites and social content but there was a reasonable amount of discussion on startups and funding also.

There were some really interesting talks including; Steve Souders (Yahoo) talking about “High Performance Web Sites” and Daniel Burka talked about “Interpreting Feedback” after creating three startups in three years, including Digg and Pownce. I’ll be expanding on most of the talks I went to on my own personal blog over the weekend and link back to them from Nodalities when they are done.

The semantic web didn’t get much of a mention, unfortunately, and when it did it was in a bit of a negative light. John Resig, in the Q&A session after his talk on “The Future of Firefox (FF3) and JavaScript”, dropped out a comment that “I don’t think the semantic web web has as much legs as it used to” – I’m sure many at Talis would have something to say about that!

Eran Shir and Jon Aizen also talked about Dapper; allowing users to gather data from pretty much any web site on the web. They claimed that RDF+OWL “doesn’t work” but did advocate Microformats to give semantic meaning to data on a web page. It may be harsh but, with my limited knowledge of what Dapper is doing, it does seem very much like data scraping rather than data sharing. This cropped up in the Q&A section and they did say that their mechanism for harvesting the data was fairly resilient but if the site owner made major structural changes then there would be fragility in the solution.

All in all, it was a very enjoyable event which was very well attended – now I just need to write up all the notes I took!

Technorati Tags: , ,

Rules for a Realistic Semantic Web?

In one of my linkblog entries earlier this week I made the following claim:

IMHO OWL isn’t part of the petatriple future of the semweb. Nor is SPARQL…

A recent post by Chimezie touched on this too:

I’ve been spending quite a bit of time on FuXi mainly because I am interested in empirical evidence which supports a school of thought which claims that Description Logic based inference (Tableaux-based inference) will never scale as well the Logic Programming equivalent – at least for certain expressive fragments of Description Logic (I say expressive because even given the things you cannot express in this subset of OWL-DL there is much more in Horn Normal Form (and Datalog) that you cannot express even in the underlying DL for OWL 1.1). The genesis of this is a paper I read, which lays out the theory, but there was no practice to support the claims at the time (at least that I knew of). If you are interested in the details, the paper is “Description Logic Programs: Combining Logic Programs with Description Logic” and written by many people who are working in the Rule Interchange Format Working Group.

It is not light reading, but is complementary to some of Bijan’s recent posts about DL-safe rules and SWRL.

A follow-up is a paper called “A Realistic Architecture for the Semantic Web” which builds on the DLP paper and makes claims that the current OWL (Description Logic-based) Semantic Web inference stack is problematic and should instead be stacked ontop of Logic Programming since Logic Programming algorithm has a much richer and pervasively deployed history (all modern relational databases, prolog, etc..)

I’m not a DL expert but, based on my research, it seems to that DL based inference for OWL isn’t going deliver for the semantic web any time soon. Of course, by this I mean it’s not going to scale in such a way that makes real-time inferencing over petatriples viable. Besides, OWL and its variations are still very limited in their expressivity and not particularly useful for many classes of applications. Maybe rule systems can deliver instead?