Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Author Archive

Web Application Authentication

Google just launched their Account Authentication mechanism:

Google Accounts authentication for web-based applications allows the application to access a Google service protected by a user’s Google account. To maintain a high level of security, the Authentication Proxy interface, AuthSub, enables the application to get an authentication token without ever handling the user’s account login information. Using the proxy, the user of the web application logs into their account through a Google-supplied login page and consents to grant limited access to the web application.

This comes while a post from Dare Obasanjo was fresh in my mind:

The devil is in the details when talking about authentication, authorization and Web APIs. When I first heard about the Yahoo’s proposed authentication model for Web APIs at their ETech 2006 talk entitled Building a Participation Platform: Yahoo! Web Services Past, Present, and Future, I thought it sounded similar to the model used by Passport Windows Live ID. In both approaches instead of applications prompting users for their credentials (username/password combo), the user signs in to the primary service which then returns an opaque token to the target application that identifies the user and gives the application permission to access the user’s data. However, having a fine grained access that can give applications access only specific services and can revoke permission given to specific applications seems to be richer than what I’ve seen offered by Passport Windows Live ID. This is nice but it’s to be seen how easy this will be for users to understand or for applications to manage.

Dare then goes on to define two characteristics of web application authentication that he sees as essential:

User credentials are sacred and must be protected at all costs: A security mechanism is only as strong as its weakest link. This means that it is extremely unwise to build an authentication model that has applications built on your APIs to request username/passwords or other credentials from users directly

and

Do not discriminate against any platform or any device: In todays world, end users interact with online services using a variety of devices and platforms. Each device and platform has different strengths and limitations but is important in its own right.

As far as I can tell, Google’s authentication appears to satisfy both points, provided you read Dare’s words as meaning “don’t discriminate so long as the platform or device can speak HTTP”. The Google approach is almost identical to the established Flickr authentication API, the only functional difference being that Flickr returns the login page and consent form in two steps rather than Google’s single step. Google also supports secure access using certificates which is a welcome addition.

The Google site includes this diagram of the interactions which at first glance would suggest that the web application somehow asks the Google service to contact the user directly, which of course is unlikely in the web architecture:

Authsub_sml.png

I drew my own diagram of the interactions taking place which I think clarifies the situation. The web application redirects the user to Google’s service, passing along the URI that it wants Google to send the user back to once they’ve been authenticated. In this final redirection of the user’s request Google includes a one-off token which the application can use to get a longer duration session key for use with other Google services. This is exactly the same model as Flickr’s, who call the initial token a “frob”.

I’m following development in this space very closely and I’m very encouraged to see two almost identical authentication procedures adopted by these companies. All we need now is a third and we’ve probably got enough for a de-facto standard, which with a bit of will and wrangling could become a nice little IETF draft.

Update: in the time I spent thinking about and writing this post Google appear to have pulled the API completely. Hopefully it’ll return shortly.

Technorati Tags: , ,

Sparql Clipboard

Benjamin Nowack has produced an intriguing demonstration of a live web clipboard with a twist. The twist is that the data to be copied isn’t embedded in the web page, instead it there’s a reference to a Sparql server from which that data can be obtained. When you copy a snippet using your browser’s normal clipboard function a unique identifier for the snippet and a link to the Sparql service are copied. When you subsequently paste, the code passes the identifier to the service and it’s the results of that lookup that are pasted. This really is a novel idea and would certainly work extremely well with our directory which provides a Sparql lookup for each resource listed. Even better, Benjamin’s demo uses embedded RDF to describe the copyable snippets

Technorati Tags: , ,

Embedded RDF

The past couple of weeks has seen a burst of activity around Embedded RDF, our method for embedding a subset of RDF into web pages. Earlier this month, I presented eRDF to an audience at XTech 2006 (slides here), the aim of which was to explain more clearly the benefits of the eRDF approach. I’m quite pleased with how it went, especially because that afternoon I encountered Leigh Dodds busy building eRDF support into his XML Army Knife service. Now Sparql queries run through his site can be targetted at HTML pages containing Embedded RDF. For example, here’s a query that lists the blogs I write for. However, instead of the query operating on some separately published RDF this is using the same HTML page that you see when you visit my home page. This is quite an awesome view of the next generation web of data, where the web we know and love becomes a friendly place for machines too.

Then, over the weekend, Benjamin Nowack announced an eRDF parser written in PHP. I’ve had some great feedback from Benjamin over the past few weeks as he first started learning the ins and outs of eRDF, then began implementing it. I have a number of errata to incorporate into the main specification and I also want to start exploring some of the new ideas Benjamin has around using owl:sameAs to enable eRDF to embed metadata about other documents.

Technorati Tags: , ,

Semantic Web 2.0

I’m on my way to the XTech conference in Amsterdam. The schedule looks to be packed full of fascinating topics around open data, semnatic web and web 2.0 – all areas of extreme importance to me. It’s good to see some strong RDF work being showcased such as Ingenta’s huge data store and the BBC’s new programme catalogue.

Some of the presentations I hope to see include:

What an amazing line up of speakers! I’m honoured to have been given the opportunity to present the work I’ve been doing around embedding RDF encoded metadata into web pages using idiomatic HTML. It’s a solution that works now and is backwards compatible with long-standing Dublin Core conventions for adding metadata to pages. It also plays nicely with upcoming technologies such as GRDDL and co-exists happily with microformats in the same page. I’m also chairing sessions on Search engines for Semantic Web knowledge, Building the Semantic Web at NASA, The End of the Open Internet?: Network Service and Security in Web 2.0 and Semantics Through the Tag, the last by Dave Beckett who gave me a sneak preview at the Jena User Conference last week. He’s doing some innovative stuff around the semantics of tagging. Cool to see a big silicon valley company taking on RDF and the Semantic Web.

But, before that, I’ll be taking part in the Ajax lightning demo session tonight. I’m going to show our Library 2.0 demonstrator which illustrates how applications can be simply composed of diverse web services. I rarely link to it because it makes a lot more sense with the narrative that I use when demoing it. So, if you’re at XTech this year, I encourage you to come along tonight to see it being demoed. If you can’t, then have a play anyway. The best search terms are ones that result in books with ISBNs, which generally means words invented in the past twenty years: javascript, google, george w. bush

Technorati Tags: , , , , ,

Word 2007 To Support Atom

Joe Friend writes:

In Beta 2 we support MSN Spaces, SharePoint 2007 (of course), Blogger, and Community Server (which is used for blogs.msdn.com). You can also set up a custom account with services that support the metaweblog API or the ATOM API. All the blog providers seems to interpret these APIs a bit different so there kinks we’re still working out. But the basics should work in Beta 2. We hope to add a few more services to the list before we ship. The Word blog authoring feature is extensible and we will publish information so that blog providers can insure that their systems work with Word.

The Atom Publishing Protocol is going to be the most important specification of the decade.

Technorati Tags: , , , ,

The Right Tool

Anne Thomas Manes writes:

I’m a huge proponent of the KISS principle. So I don’t recommend using WS-* for all service interactions. If an application doesn’t require enterprisey infrastructure semantics, then it’s much more appropriate to use a simpler middleware system, such as “plain old XML” (POX) over HTTP. In fact, for applications that require Internet scalability (e.g., mass consumer-oriented services), POX is a much better solution than WS-*

I agree wholeheartedly, but then I would wouldn’t I? What’s interesting is her opening statement:

I’m one of the folks responsible for mixing the Kool-Aid. I presented at the W3C Workshop on Web Services (representing Sun). I participated in numerous standardization efforts at W3C, OASIS, WS-I, uddi.org, and JCP. I have a vested interest in making sure that WS-* succeeds.

Given this context the earlier quote is rather puzzling. Here’s my interpretation: the enterprise web service stack doesn’t scale and there’s some fundamental problem we can’t fix that means it never will. So, if you want your application to scale we recommend you use an alternative service technology

If the fundamental problem could be fixed then I would expect the recommendation to be that we wait for that to be sorted out, as we have waited through the various versions of the 60+ WS specifications. But instead we’re told that the WS-* stack doesn’t suit Internet-scale applications, only enterprise applications.

Where should the line be drawn between enterprise and Internet-scale? How do you even characterise Internet-scale? I think the only sensible scale characterisation is in terms of volume of transactions. Mass consumer-oriented services suggests large numbers of concurrent transactions involving a large number of participants. But as organisations deal more directly with their customers, trade effectively with growing numbers of partners and coordinate activity between employees across the globe aren’t the systems needed to support these processes becoming more and more Internet-scale? In fact, when the enterprise term first emerged it was used to characterise very high volume systems to support things like EDI with trading partners. Now the reality is that WS-*, designed for the enterprise, isn’t up to the job.

I think this is a case of a technology finding its niche as I pointed out last year. Java went through a series of repositionings from its origins as a set-top box technology, through applets and the desktop before finally maturing into a place that suits its characteristics – building maintainable back-ends for web applications.

What’s the niche for WS-*? The vendors say it’s the enterprise which excludes organisations without a dedicated IT department. It also now excludes all organisations that wish to develop Internet-scale, high transaction volume applications. Its niche is those organisations that wish to build only applications that support their own processes, so long as those organisations are not themselves Internet-scale. So, if you’re a medium level organisation, not too small and not too large and you don’t deal with the public over the Internet then the 60+ specifications of the WS-* stack would appear to be suitable for you.

By the way, there’s a gem of a quote in the comments too:

WS-* makes J2EE look like a toy in terms of complexity.

Made me chuckle on the train this morning.

Technorati Tags: ,

Atom Support

I’ve been a longtime supporter and promoter of RSS, especially RSS 1.0 given my involvement in its creation and the work we’re doing with the semantic web. But for the Silkworm Directory we chose to support Atom instead.

While it’s true today that RSS has wider adoption by content publishers and appears to have the endorsement of Microsoft there are a number of telling signs that lead me to believe that the days of RSS are numbered. For a start, Atom is very well specified by a respected standards body which in of itself affects adoption by organisations concerned about supportability and future-proofing. Also Microsoft has backed away from their initial embracing of RSS and now prefer the generic web feed term which encompasses Atom too. It’s significant that virtually all feed readers support both Atom and RSS and the ones that don’t probably account for less than a thousandth of a percent of all feed usage.

However, there are also a number of compelling technical reasons for choosing Atom. The primary one is that Atom clearly specifies the rules for escaping content, something that RSS has traditionally been very bad at. The secondary reason, but the one that has the most interesting implications, is that Atom can be used to syndicate other content as a sort of payload. To be fair RSS has a partial solution and can include a link to remote content (this is the foundation for podcasting) but Atom also lets you put content right inside the feed along with the usual metadata of title, link and short summary. The feed then assumes the role of a packaging format for transporting chronological data. The key to this working is Atom’s summary element which the feed reader can use if it does not understand the payload’s content type. In this way the feed remains human readable but can also carry machine targetted data allowing applications to subscribe to updates.

This fits our use case extremely well. To understand why I have to delve a little into the underpinnings of Silkworm. As I mentioned yesterday, the Silkworm Directory uses an RDF triple store to manage all of its data. While the RDF model eliminates many traditional database problems around data modelling and evolution it introduces new ones of its own. Since the database is generally unconstrained any resource can have any number of any property. Sometimes this is exactly what you want, so our collection descriptions have multiple identifiers or services that can be used to search them. However, sometimes you want to constrain the occurence of certain properties to make the system more manageable. For example, we only want one title in each language for our collections. Out of the box RDF doesn’t give you a way to prevent multiple titles from being added to a resource. the normal mode of operation is simply to add RDF to a store. If you want to remove triples or replace existing ones then it has to be controlled by the application itself rather than being a generic task supported by the store. (You can use OWL to define a class of collections to be those things with only one title, but that doesn’t stop them being added, it just means that the things with multiple titles aren’t collections!)

We solved the problem by introducing the notion of changesets. I’ll leave the deep explanation of how they work for another posting, but the concept is simple: like a UNIX diff, a changeset consists of a list of triples that need to be either added to or removed from the store. We constrain changesets so that they only ever apply to a single resource. (More details here). A changeset is itself RDF so we store those in the directory as well in a linked list which represents all the changes applied to a resource description from its creation. When you view the change history of a resource you’re looking at its list of changesets, a chronological list of changes to a resource. This is where Atom comes in.

We use Atom to package the changesets relating to a resource, each changeset being embedded as RDF content within the feed accompanied by a human-readable version of the particular change applied. For an example look at this history page, its underlying RDF and the equivilent as an Atom feed. The Atom feed is built simply by applying a stylesheet to the underlying RDF. We still have some work to do on generating better human readable summaries but the principle is sound.

What’s the practical benefit? Syndicating changesets over Atom gives us a lightweight and web-friendly synchronisation mechanism for data stores. Each store can subscribe to the others feeds and apply the embedded changesets as they arrive. This is pretty compelling to anyone dealing with distributed data management and I think it represents a significant advance over anything else out there in the RDF world. One immediate use case for us is offline archival of changes. We may decide to limit the number of changes kept live in the directory depending on performance characteristics, but we intend to keep all the changes archived out of the main database. Our archiving could be based on a simple subscription to the live directory.

We’re working on a fuller description of the changeset mechanism and API which will appear on the TDN in due course. And, for those watching closely, keep your eyes open because you might even spot the first public glimmers of Bigfoot.

Technorati Tags: , , , , ,

Jena Conference

Sam and I are off down to Bristol to participate in the first Jena user conference tomorrow. Jena is the RDF framework that we’re using to power the directory and since it contains what is essentially the reference implementation of Sparql we use it to enable rich querying over the directory data. I’m looking forward to meeting many of the usual suspects and hopefully many new ones too. Come and say hi if you’re around.

Web Of Data

Since this is my first post to Nodalities I thought I would write about some of the themes I plan to cover in the coming months.

One of my key interests is the fostering of a web of data built on the foundations of the document-oriented web that we have today. The Web 2.0 movement of the past couple of years has made great strides in exposing some of the social and technical requirements that are necessary for the web of data to thrive: social models for data creation that exploit the network effect; intentional data exposure for recombination and consumption; agreement on common data formats; addressability of data; pervasive networking.

The most successful Web 2.0 services exhibit all of these characteristics but I think there are some missing pieces that would make the web of data closer to reality. There needs to be agreement, not only on the format of the data, but on the common entities used in the data. This doesn’t mean adoption of a single schema across all systems, but there needs to be a way to cross-reference one system’s concept of a user with another’s. I think one way to do that is for RDF to provide the linkage of concepts, using it to declaratively state how different data from different systems can be mapped. We’re doing this already in our Silkworm Directory where we associate interfaces used by search services to one another despite their different syntaxes. This lets us provide a uniform search interface across dozens of different types of search software.

Another missing piece of the puzzle is licencing of the data. A quick survey of the existing Web 2.0 services shows a distinct lack of clarity around the terms of use of the data even though the efforts of the Creative Commons has done wonders for the availability of creative works. We’re addressing that too, but there is still much to do to.

The Web 2.0 companies are at the bleeding edge of innovation and moving ahead rapidly. For the web of data to become a reality we need to help more mainstream companies and institutions catch the slipstream and adopt these practices. However, there are huge technical barriers to overcome for these organisations in the areas of data storage, management and distribution. Most understand their requirements for managing data when the audience is limited to their own employees or members, but opening more than a few web pages up to the general public is an unknown quantity. Tasks ranging from serving thousands of concurrent requests rather than low single figures to managing security and integrity of hugely participative systems are complex, expensive and foreign territory for most IT departments. Bringing the benefits of the web of data to these organisations is a huge challenge. We think the best way is to offer a platform for data sharing and to lead by example, gradually demonstrating the advantages of breaking open silos of data and of enabling users to participate deeply in the process. There’s a long, winding road ahead but it’s going to be huge amount of fun walking it!

Technorati Tags: , , , ,