Subscribe

Author Archive

Moriarty DataTables: Active Record for RDF

DataTables are a new addition to the Moriarty PHP library. They are an implementation of the ActiveRecord pattern for use with RDF data in Talis Platform stores. It draws inspiration from the active record implementation in CodeIgniter.

The intention is to allow querying of RDF data in a natural way for most PHP coders. For example:

$dt->select('firstname')->from('person')->where('surname','Evans');
$dt->get();

In a relational database that kind of code would select the firstname column for every record in the person table that has a surname column with a value of Evans. With RDF we have two problems:

  1. there are no columns or tables, instead we have properties and classes.
  2. URIs are used to name things and URIs are long, ugly and easy to get wrong.

Moriarty’s DataTable class attempts to solve these two problems. It solves the first by treating properties as columns and classes as tables. The second problem it solves by allowing the user to specify short names for URIs. So we can write:

$dt->map('http://xmlns.com/foaf/0.1/firstName', 'firstname');
$dt->map('http://xmlns.com/foaf/0.1/surname', 'surname');
$dt->map('http://xmlns.com/foaf/0.1/Person', 'person');
$dt->select('firstname')->from('person')->where('surname','Evans');
$dt->get();

We can read that as selecting all values of the foaf:firstName property for resources of type foaf:Person that also have a foaf:surname property with a value of Evans. The DataTable class converts that into a SPARQL select query behind the scenes.

This means you can very simply query and use RDF data from a Talis Platform store. To get the first 10 names and nicknames from a store:

$dt = new DataTable('http://api.talis.com/stores/mystore');
$dt->map('http://xmlns.com/foaf/0.1/name', 'name');
$dt->map('http://xmlns.com/foaf/0.1/nick', 'nick');

$dt->select('name,nick')->limit(10);
$res = $dt->get();

foreach ($res->result() as $row) {
   echo $row->name;
   echo $row->nick;
}

I’ve written up a collection of example queries based on the education data held in the data.gov.uk service.

When I was thinking about how to map the ideas from active record into RDF I was stumped at how to implement table joins. This bothered me because if there is one thing RDF excels at it’s links between resources. Here’s an example of how CodeIgniter implements the join syntax:

$this->db->select('firstname, blog.title');
$this->db->from('person');
$this->db->join('blog', 'person.id = blog.id');

It turned out that the answer was incredibly simple and elegant: you don’t need them! The whole concept of the join method in most active record implementations is to compensate for the fact that relational databases don’t name their relationships (some do but it is very rarely used in practice and not commonly supported in SQL). If you think about the RDF equivalent of that query it becomes clearer: select the name of each resource of type person and for each of their blogs select its title. That join is just the property relating the person resource to the blog resource, probably foaf:weblog.

When you use a DataTable you specify a join simply by including a dotted property path in the select method, e.g. blog.title where blog and title both map to properties. That lets us write our query like this:

$dt->map('http://xmlns.com/foaf/0.1/firstName', 'firstname');
$dt->map('http://xmlns.com/foaf/0.1/weblog', 'blog');
$dt->map('http://purl.org/dc/elements/title', 'title');

$this->db->select('firstname, blog.title');
$this->db->from('person');

Ignoring the mappings this is much simpler than the relational database equivalent! Here’s a good example of using these joinless queries.

DataTables aren’t just for querying. They also support insert and update. To insert a new description of a resource:

$dt = new DataTable('http://api.talis.com/stores/mystore');
$dt->map('http://xmlns.com/foaf/0.1/name', 'name');
$dt->map('http://xmlns.com/foaf/0.1/Person', 'person');
$dt->set('name', 'scooby');
$response = $dt->insert('person');

This translates to submitting a description of a blank node, with a foaf:name property having a value of scooby and an rdf:type of foaf:Person. If you want to submit a description about a resource with a known URI, then you need to set the special _uri field like this:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$response = $dt->insert('person');

Behind the scenes the insert method generates the RDF and POSTs it into the store’s metabox. Updates work in a similar way:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$dt->where('nick', 'scoob');
$response = $dt->update();

Here the update method queries the store for the current value of the name property for the specified resource and generates a changeset which it then submits to the store’s metabox. This also works for multiple resources, so to update the resource description for anything with a name of shaggy to have a name of scooby:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$dt->where('name', 'shaggy');
$response = $dt->update();

Full documentation can be found here: DataTable and DataTableResult

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is being developed by small community of developers and is in continual beta, subject to a slow stream of updates. To find out more visit its Google Code project

Introducing Pynappl

Over the summer I spent some time working on a Python library for working with the Talis Platform. I’ve spent a lot of time developing the PHP-based Moriarty library and I’ve been wanting to apply that experience to other languages. Leigh has made good progress on the Ruby front with Pho and we have a nascent Java-based client: Penry. Considering Python’s excellent RDF support it seemed the natural choice to tackle next.

Pynappl is the resulting library. It’s still very early in its evolution and so has lots of rough edges, gaps and rather dubious design choices. So far Pynappl’s feature set has been driven by the real applications I have been working with so there is a distinct bias towards data loading and management of stores. The Store class is the workhorse of the library and contains methods for loading RDF, running SPARQL queries, scheduling jobs and reading and writing of field/predicate maps and query profiles.

In keeping with my general philosophy for building RESTful applications, the HTTP based methods on the Store class make it very obvious that you are working with a fallible network by returning a tuple containing the HTTP reswponse and the body of the response. Its up to you to use or ignore the response as you see fit for your application. Many methods attempt to interpret the results of the method call but this can be switched off using an argument called “raw”. For example this code takes advantage of the interpretation and parsing of the SPARQL results:

store = pynappl.Store(store_uri, username, password)
(response, body) = store.select("select * where {?s a ?o} limit 10")
(sparql_header, results) = body
for result in results:
    print "%s (a %s)" % (str(result['s']), str(result['o']))

This can be switched off to get at the raw response body:

(response, body) = store.select("select * where {?s a ?o} limit 10", True)
print body

Also included is a command line application called tstore that wraps up a lot of these operations, including waiting for batch operations to complete. For example, to reset a store and load data into it takes just two lines:

./tstore --store mystore --user username --password xxxx reset --wait
./tstore --store mystore --user username --password xxxx store -f data.rdf

Please take a look at Pynappl and let me know what you think or of you’d like to get involved and help out.

About Pynappl… Pynappl is a simple open source Python library for working with the Talis Platform. Currently it is focussed mainly on managing data loading and manipulation of Talis Platform stores. Pynappl is an early alpha and is substantially incomplete (we’re looking for interested contributors. You can read more about Pynappl at its Google Code project page

Using Moriarty for Serving Linked Data

Although Moriarty is a general purpose library for building applications with the Talis Platform (and tried and tested in Talis Prism and Talis Aspire) one of the most common uses is simply to provide a browsable interface for linked data held in a Talis Platform store. Typically these scripts take the URI sent by the web browser and use a SPARQL query or the Talis Platform’s describe service to fetch linked data about that URI. They then style that as HTML or send it directly back as RDF. There are a series of technical details they all need to deal with: 303 redirects, content negotiation, converting RDF to HTML etc.

I’ve worked on more comprehensive libraries (e.g. Paget) to manage this kind of publishing but I thought the simple case of fetching and styling the data would make a good example of how to use Moriarty. I spent a bit of time this afternoon putting an example script together based on several I’ve written in the past. You can find the result in the dataspace subdirectory of the examples folder. That subdirectory contains four files:

  • dataspace.php — this is the example script. It contains the logic to fetch the relevant description, handle content negotiation of the best output format and styling the result appropriately. It’s not designed to be called directly, but to be included from a configuration file…
  • index.php — this is an example configuration file. It is designed to be dropped into a web server directory and then intercept all requests beginning with that URI. It contains the configuration describing which Talis Platform store to use, where cache files can be written and where to find Moriarty and ARC2. The last thing it does is to load dataspace.php which then handles the browser request.
  • sample.htaccess — this is a sample .htaccess file for Apache webservers. It redirects all requests via index.php.
  • plain.tmpl.html — this is the default template used to render the HTML views. This can be overridden in the configuration.

Using the example script is simple: just copy index.php to the root directory of your linked data space. If you’re using Apache then you need to copy sample.htaccess into the same directory and rename it to .htaccess. Edit index.php so it refers to your store and your URIs and that’s it! You can see it in action with the default template on my own linked data space.

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is being developed by small community of developers and is in continual beta, subject to a slow stream of updates. You can read more about Moriarty on the n² wiki or visit its Google Code project

Moriarty Progess Report

It’s been a while since I wrote about Moriarty, the PHP library I created for working with the Talis Platform. That’s not to say that there have been no changes: on the contrary, there have been lots of improvements and some major new areas of functionality. I’m going to summarise them in this post and then, time permitting, follow up with more detailed posts on particular areas.

  • Fresnel Selector Language — This is a major new addition to Moriarty. A new class called GraphPath has been added which implements almost all of the Fresnel Selector Language specification. I’ve been interested in RDF path languages for a long time and FSL now appears to be the strongest contender. Currently this is a stand-alone class, but after a few more cycles of testing it would be nice to add a convenience method to SimpleGraph to allow selection of resources using paths.
  • Zend-compatible caching — I did a substantial refactoring a while back to convert Moriarty’s HTTP caching implementation to be compatible with Zend’s cache interfaces. Whereas before you were limited to caching HTTP responses to disk, you can now supply any of Zend’s built-in cache classes to enable caching in databases, memcached and many other systems. You do this by creating an instance of HttpRequestFactory with your cache class and then pass the factory to the Store class.
  • JSON usageRelease 24 of the Talis Platform introduced RDF/JSON serialisation for describe and constuct SPARQL queries. Moriarty now requests this format where it and because the SimpleGraph class uses an RDF/JSON structure as its index often there is no result parsing involved at all.
  • OAI Service Support — OAIService is a new class that represents a store’s OAI-PMH Service. It provides simple access to the OAI service allowing all resources in a store to be listed.
  • Automated Builds — We have added Moriarty to an Hudson server which monitors the subversion repository and runs the unit test suite after every checkin. It’s not ideal because the server is not accessible to users outside of Talis (come on Google Code – we need Hudson support!). However it adds an extra level of confidence to checkins because test failures are emailed out to moriarty-dev@googlegroups.com which is open for anyone to join. A few times in the past we have run the unit tests locally and then forgotten to check in some critical dependency so the subversion trunk contains a broken build. Hudson will alert us to these kinds of errors much more quickly.
  • Extended Describes — The Sparql classes now accept an extra parameter to their describe methods that allows you to specify the type of description you want. By default you get the Platform’s default graph which is a list of triples that have the subject you specify (no bnodes remember!). Moriarty allows you to easily request other types such as symmetric bounded description (triples where the URI being described is subject or object), labelled bounded description (like the default description plus the addition of label properties for URIs in the description set) and symmetric labelled bounded description (a combination of the previous two). See the Bounded Description page on the n2 wiki for more information.
  • Richer Store Interface — Up until now Moriarty has used a object model that closely follows the Platform’s separation of services. That tended to make code using Moriarty quite verbose. We’re now gradually introucing convenience methods onto the Store class so common operations can be accessed with less code.

The moriarty Google Code project now has several committers although Keith and myself are still the most prolific. However, having multiple committers is one more step away from this being a personal project and towards it being community owned. Moriarty is being used in lots of small projects in and around Talis, but significantly it is also in the core of two of our most important products: Talis Prism and Talis Aspire. That’s great validation for Moriarty, although it brings a lot more responsibility in terms of quality of testing. I now consider Moriarty to be out of continual alpha and into continual beta!

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is being developed by small community of developers and is in continual beta, subject to a slow stream of updates. You can read more about Moriarty on the n² wiki or visit its Google Code project

Opening for a Senior Platform Developer

We have an opening for a Senior Developer at Talis in the Platform development group. Talis is a mature and solid business based in the UK and provides a unique mix of loyal customers, amazing innovation and a focus on the long term.

Our platform development group is responsible for making sure that the Talis Platform is the premier environment for developing and delivering great Semantic Web applications. We need your help in designing and building our infrastructure to support hundreds of thousands of users and their data. We’re looking for people who:

  • use their code to communicate their ideas clearly
  • are proficient in Java and comfortable in Python, PHP and other scripting languages
  • can break dependencies and decompose hard problems into simpler ones
  • never forget about scalability, performance and security
  • prefer to develop test first
  • have spent time modelling data in RDF
  • can develop solutions to problems, communicate them to the team and get them implemented quickly
  • aren’t afraid to ask questions
  • have implemented HTTP clients and servers
  • like to say “let’s try it” and “we can do that”
  • understand how to balance perfection with reality
  • are as happy to lead as to follow
  • know when to reuse and when to start afresh
  • can tell us about something new they learned this year

How to apply:

Take a look at the problems below and select two to answer. Please send us your C.V and an application
letter including your answers to careers@talis.com

  • The Web can be modelled as a network of nodes labelled with URLs and connected by directed arcs. Suppose we want to find all the URLs linked to and from any given URL, and all the URLs that are linked from any two given URLs. What kind of data structures might be suitable for representing and querying a network with 10^8 nodes each having between 10 and 50 arcs?
  • Discuss the different types of automated testing that are needed to maintain high quality software. What kinds of programming language are best suited to each type of testing? What techniques could be used for testing asynchronous processes and for processes that operate over large volumes of data? Are there any situations that you wouldn’t test?
  • Large-scale systems composed of many cooperating application servers often need to share and cache configuration. Suppose any server can initiate changes that need to be reflected in real time to the other application servers in the cluster. What strategies could you use for coordinating this kind of behaviour and how are they tolerant to various failure conditions?

Moriarty Release 1.1

After some nudging from the Talis development team I tagged the current trunk of Moriarty as version 1.1:

http://moriarty.googlecode.com/svn/tags/release-1.1/

This is a stable release and should be backwards compatible with 1.0. The trunk continues to be the bleeding edge.

Moriarty Documentation

I started adding some API documentation to Moriarty using the excellent PHPDoctor. The documentation is in subversion but you can also view it online.

Moriarty Development List

I noticed that I was the only one getting notificiations of commits to Moriarty‘s subversion. I thought the best way to fix that was to create a Google group for moriarty and ensure the commit reports get sent there. So if you’re interested in keeping track of changes to Moriarty please sign up: moriarty-dev

Alternative to CURL in Moriarty

I just checked in a small update to moriarty that might solve a problem some people have experienced using curl. It appears that even though curl implemented support for HTTP digest way back in 2003 with version 7.10.6, it took several more releases to iron out the bugs. The version I develop with 7.18.0 (and the version installed on Talis application servers) works without issue, but many webhosts have much older versions. In fact my own webhost is still on 7.10.6 which means that digest authentication doesn’t work as expected. To date there has been no workaround. The latest change to Moriarty adds support for using httpclient written by Manuel Lemos. This is a complete HTTP client written in PHP. To use digest authentication you also need sasl which is also written by Manuel Lemos. Moriarty looks for those two classes and uses them if it finds them otherwise it falls back to using curl as before.

To use httpclient with Moriarty you just need to ensure that http_class and sasl_interact_class are loaded before using any HTTP actions. Adding lines like the following to your index.php (or somewhere similar) should do the trick:

    require_once '/path/to/moriarty/lib/httpclient/http.php';

    require_once '/path/to/moriarty/lib/sasl/sasl.php';

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is primarily being developed by Ian Davis and is in continual alpha, subject to occasional rapid bursts of change. You can read more about Moriarty on the n² wiki or visit its Google Code project

Moriarty Now Hosted on Google Code

A couple of weeks ago I moved Moriarty from my playground area of the n² SVN repository to a new project at google code. This brings the advantage of neat issue tracking and code review capabilities as well as better management of contributors and collaborators. The new SVN repository is now http://moriarty.googlecode.com/svn/trunk/ (with an interactive view too). Just email me {at} iandavis.com if you’d like to be added to the project.