Atom Support
I’ve been a longtime supporter and promoter of RSS, especially RSS 1.0 given my involvement in its creation and the work we’re doing with the semantic web. But for the Silkworm Directory we chose to support Atom instead.
While it’s true today that RSS has wider adoption by content publishers and appears to have the endorsement of Microsoft there are a number of telling signs that lead me to believe that the days of RSS are numbered. For a start, Atom is very well specified by a respected standards body which in of itself affects adoption by organisations concerned about supportability and future-proofing. Also Microsoft has backed away from their initial embracing of RSS and now prefer the generic web feed term which encompasses Atom too. It’s significant that virtually all feed readers support both Atom and RSS and the ones that don’t probably account for less than a thousandth of a percent of all feed usage.
However, there are also a number of compelling technical reasons for choosing Atom. The primary one is that Atom clearly specifies the rules for escaping content, something that RSS has traditionally been very bad at. The secondary reason, but the one that has the most interesting implications, is that Atom can be used to syndicate other content as a sort of payload. To be fair RSS has a partial solution and can include a link to remote content (this is the foundation for podcasting) but Atom also lets you put content right inside the feed along with the usual metadata of title, link and short summary. The feed then assumes the role of a packaging format for transporting chronological data. The key to this working is Atom’s summary element which the feed reader can use if it does not understand the payload’s content type. In this way the feed remains human readable but can also carry machine targetted data allowing applications to subscribe to updates.
This fits our use case extremely well. To understand why I have to delve a little into the underpinnings of Silkworm. As I mentioned yesterday, the Silkworm Directory uses an RDF triple store to manage all of its data. While the RDF model eliminates many traditional database problems around data modelling and evolution it introduces new ones of its own. Since the database is generally unconstrained any resource can have any number of any property. Sometimes this is exactly what you want, so our collection descriptions have multiple identifiers or services that can be used to search them. However, sometimes you want to constrain the occurence of certain properties to make the system more manageable. For example, we only want one title in each language for our collections. Out of the box RDF doesn’t give you a way to prevent multiple titles from being added to a resource. the normal mode of operation is simply to add RDF to a store. If you want to remove triples or replace existing ones then it has to be controlled by the application itself rather than being a generic task supported by the store. (You can use OWL to define a class of collections to be those things with only one title, but that doesn’t stop them being added, it just means that the things with multiple titles aren’t collections!)
We solved the problem by introducing the notion of changesets. I’ll leave the deep explanation of how they work for another posting, but the concept is simple: like a UNIX diff, a changeset consists of a list of triples that need to be either added to or removed from the store. We constrain changesets so that they only ever apply to a single resource. (More details here). A changeset is itself RDF so we store those in the directory as well in a linked list which represents all the changes applied to a resource description from its creation. When you view the change history of a resource you’re looking at its list of changesets, a chronological list of changes to a resource. This is where Atom comes in.
We use Atom to package the changesets relating to a resource, each changeset being embedded as RDF content within the feed accompanied by a human-readable version of the particular change applied. For an example look at this history page, its underlying RDF and the equivilent as an Atom feed. The Atom feed is built simply by applying a stylesheet to the underlying RDF. We still have some work to do on generating better human readable summaries but the principle is sound.
What’s the practical benefit? Syndicating changesets over Atom gives us a lightweight and web-friendly synchronisation mechanism for data stores. Each store can subscribe to the others feeds and apply the embedded changesets as they arrive. This is pretty compelling to anyone dealing with distributed data management and I think it represents a significant advance over anything else out there in the RDF world. One immediate use case for us is offline archival of changes. We may decide to limit the number of changes kept live in the directory depending on performance characteristics, but we intend to keep all the changes archived out of the main database. Our archiving could be based on a simple subscription to the live directory.
We’re working on a fuller description of the changeset mechanism and API which will appear on the TDN in due course. And, for those watching closely, keep your eyes open because you might even spot the first public glimmers of Bigfoot.
Technorati Tags: talis, changesets, rdf, atom, silkworm, rss



