Subscribe

Moriarty DataTables: Active Record for RDF

DataTables are a new addition to the Moriarty PHP library. They are an implementation of the ActiveRecord pattern for use with RDF data in Talis Platform stores. It draws inspiration from the active record implementation in CodeIgniter.

The intention is to allow querying of RDF data in a natural way for most PHP coders. For example:

$dt->select('firstname')->from('person')->where('surname','Evans');
$dt->get();

In a relational database that kind of code would select the firstname column for every record in the person table that has a surname column with a value of Evans. With RDF we have two problems:

  1. there are no columns or tables, instead we have properties and classes.
  2. URIs are used to name things and URIs are long, ugly and easy to get wrong.

Moriarty’s DataTable class attempts to solve these two problems. It solves the first by treating properties as columns and classes as tables. The second problem it solves by allowing the user to specify short names for URIs. So we can write:

$dt->map('http://xmlns.com/foaf/0.1/firstName', 'firstname');
$dt->map('http://xmlns.com/foaf/0.1/surname', 'surname');
$dt->map('http://xmlns.com/foaf/0.1/Person', 'person');
$dt->select('firstname')->from('person')->where('surname','Evans');
$dt->get();

We can read that as selecting all values of the foaf:firstName property for resources of type foaf:Person that also have a foaf:surname property with a value of Evans. The DataTable class converts that into a SPARQL select query behind the scenes.

This means you can very simply query and use RDF data from a Talis Platform store. To get the first 10 names and nicknames from a store:

$dt = new DataTable('http://api.talis.com/stores/mystore');
$dt->map('http://xmlns.com/foaf/0.1/name', 'name');
$dt->map('http://xmlns.com/foaf/0.1/nick', 'nick');

$dt->select('name,nick')->limit(10);
$res = $dt->get();

foreach ($res->result() as $row) {
   echo $row->name;
   echo $row->nick;
}

I’ve written up a collection of example queries based on the education data held in the data.gov.uk service.

When I was thinking about how to map the ideas from active record into RDF I was stumped at how to implement table joins. This bothered me because if there is one thing RDF excels at it’s links between resources. Here’s an example of how CodeIgniter implements the join syntax:

$this->db->select('firstname, blog.title');
$this->db->from('person');
$this->db->join('blog', 'person.id = blog.id');

It turned out that the answer was incredibly simple and elegant: you don’t need them! The whole concept of the join method in most active record implementations is to compensate for the fact that relational databases don’t name their relationships (some do but it is very rarely used in practice and not commonly supported in SQL). If you think about the RDF equivalent of that query it becomes clearer: select the name of each resource of type person and for each of their blogs select its title. That join is just the property relating the person resource to the blog resource, probably foaf:weblog.

When you use a DataTable you specify a join simply by including a dotted property path in the select method, e.g. blog.title where blog and title both map to properties. That lets us write our query like this:

$dt->map('http://xmlns.com/foaf/0.1/firstName', 'firstname');
$dt->map('http://xmlns.com/foaf/0.1/weblog', 'blog');
$dt->map('http://purl.org/dc/elements/title', 'title');

$this->db->select('firstname, blog.title');
$this->db->from('person');

Ignoring the mappings this is much simpler than the relational database equivalent! Here’s a good example of using these joinless queries.

DataTables aren’t just for querying. They also support insert and update. To insert a new description of a resource:

$dt = new DataTable('http://api.talis.com/stores/mystore');
$dt->map('http://xmlns.com/foaf/0.1/name', 'name');
$dt->map('http://xmlns.com/foaf/0.1/Person', 'person');
$dt->set('name', 'scooby');
$response = $dt->insert('person');

This translates to submitting a description of a blank node, with a foaf:name property having a value of scooby and an rdf:type of foaf:Person. If you want to submit a description about a resource with a known URI, then you need to set the special _uri field like this:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$response = $dt->insert('person');

Behind the scenes the insert method generates the RDF and POSTs it into the store’s metabox. Updates work in a similar way:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$dt->where('nick', 'scoob');
$response = $dt->update();

Here the update method queries the store for the current value of the name property for the specified resource and generates a changeset which it then submits to the store’s metabox. This also works for multiple resources, so to update the resource description for anything with a name of shaggy to have a name of scooby:

$dt->set('_uri', 'http://example.com/people/1');
$dt->set('name', 'scooby');
$dt->where('name', 'shaggy');
$response = $dt->update();

Full documentation can be found here: DataTable and DataTableResult

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is being developed by small community of developers and is in continual beta, subject to a slow stream of updates. To find out more visit its Google Code project

SPARQL Hacks: moving query logic into data

There are too many terms that mean the same thing sometimes. Take labels. rdfs:label is perhaps the most obvious choice if you want to label something in RDF, but there are a whole bunch of semantically equivalent predicates in high usage for doing the same thing. For a while, it seems, it was common practice for every vocabulary to define their own equivalent – though very few bother to rdfs:subPropertyOf rdfs:label (and some predate rdfs:label), so even if you can do some reasoning in your query engine, this might not help you much. So when you want to get the label for something, but you don’t know which predicate the data uses, you might end up doing something like this:


construct { ?s rdfs:label ?l }
where
{
?s ?p ?o
optional
{ ?s rdfs:label ?l }
optional
{ ?s foaf:name ?l }
optional
{ ?s sioc:name ?l }
optional
{ ?s dc:title ?l }
optional
{ ?s dcterms:title ?l }
}

Nasty. And maybe later you find another label predicate in the data somewhere and have to go modify your queries.

But, if I add these triples to my store:


<#a> rdfapp:labelPredicate dc:title, rdfs:label, dcterms:title foaf:name, sioc:name .

I can instead do:


prefix rdfapp: <http://kwijibo.talis.com/vocabs/rdfapp#>
construct { ?s rdfs:label ?l }
where
{
<#a> rdfapp:labelPredicate ?labelPredicate .
?s ?labelPredicate ?l .
}

voiD stores and Interesting Queries

Amongst the best incentives for data authors are applications that use that data. One sort of data that especially interests me is dataset metadata, for which the voiD vocabulary was developed; I think this kind of data has the potential to enable the future generation of web apps to join together the ever-growing web of data in wild and exciting new ways. So I was pretty pleased when I saw the voiD store from RKB Explorer. This store provides a SPARQL endpoint over all the voiD descriptions RKB Explorer have produced about their datasets, plus some descriptions they’ve gathered about other datasets. It also provides a list of source documents, sample queries, and a service that takes a list of URIs, and returns a list of SPARQL endpoints that might be able to return triples about them.

This, together with a rainy weekend, prompted me to try out some simple voiD-related things I’d been thinking of. I’ve also been aggregating voiD data in one of my dev stores. This is done partly by creating templated descriptions from a list of Talis Platform stores and poking at them with some SPARQL queries. The rest of the data I found either manually, or by querying Sindice for a list of void:Dataset URIs found in the documents they’ve crawled.

The Sindice API allows you to specify triple patterns with wildcards, and will return you an Atom feed: * rdf:type void:Dataset . I page through the results, importing the RDF from the URIs into my store.

One of my favourite terms from voiD is void:uriRegexPattern, which can be used to indicate that if a URI matches the pattern, the dataset might contain some triples about that URI. You can do this with a bit of SPARQL:

    
prefix void: <http://rdfs.org/ns/void#>
DESCRIBE ?dataset {
     ?dataset void:uriRegexPattern ?regex ; void:sparqlEndpoint ?sparql ; a void:Dataset .

    FILTER(REGEX("http://example.com/my/uri", ?regex))
}

    

The novel thing here is that normally, when you use REGEX() in SPARQL, you put a variable binding in the first parameter position, and hardcode a regular expression into the query in the 2nd position. Here though, the regex is in the data, and it is the string against which it is evaluated which is hardcoded, and the variable binding contains the regex. (Unfortunately, while this works with ARQ, it doesn’t appear to work with 3Store – which is perhaps why the rkbexplorer voiD Store provides this as a separate web service).

So, I’ve used this to create a page that will take a URI, and query my voiD store for void:sparqlEndpoints and void:uriLookupEndpoints, which it will then call to retrieve triples and render them on the page. Here is a query for the URI http://climb.dataincubator.org/dataset .

Another query that interested me, which has become possible since the Platform introduced support for the COUNT() function from SPARQL 1.1, is, which are the most commonly used vocabularies? (SIOC and FOAF so far! – thought this is because I generated many of these triples based on scripted prodding of endpoints with ASK queries) But then I wanted to be able to see easily which datasets used which vocabularies, so I created some pages to let me browse datasets by vocabulary.

  1. SIOC Core Ontology Namespace(54)
  2. Friend of a Friend (FOAF) vocabulary(42)
  3. Coreference Ontology (35)
  4. http://www.aktors.org/ontology/portal# (34)

  5. http://www.aktors.org/ontology/support# (30)
  6. http://www.rkbexplorer.com/ontologies/resist# (30)
  7. void (25)
  8. http://purl.org/NET/scovo# (24)
  9. http://acm.rkbexplorer.com/ontologies/acm# (22)
  10. http://courseware.rkbexplorer.com/ontologies/courseware# (21)

Then I made some pages to do the same thing with dct:subjects. Here, the largest category by some way, is category: online_social_networking. This is because I generated ?dataset dct:subject <http://dbpedia.org/resource/Category:Online_social_networking> . triples automatically for all the platform stores which made a certain use of terms from the SIOC ontology.

These automatically generated voiD descriptions will not, of course, present such a balanced picture of what is out there, and skew the results somewhat. The most interesting descriptions are those which are handcrafted to some extent, describing something of the nature of the dataset’s domains.

I’ve also provided a form for submitting voiD URLs to. My hope is that this simple application, together with the rkbexplorer voiD Store, might encourage more people to describe their linked data datatsets with voiD, or perhaps add more detail to the descriptions they already publish, in order to see their dataset come up in the appropriate queries. And I hope that this, in turn, will encourage others to build more sophisticated and exciting applications using that data.

Introducing Pynappl

Over the summer I spent some time working on a Python library for working with the Talis Platform. I’ve spent a lot of time developing the PHP-based Moriarty library and I’ve been wanting to apply that experience to other languages. Leigh has made good progress on the Ruby front with Pho and we have a nascent Java-based client: Penry. Considering Python’s excellent RDF support it seemed the natural choice to tackle next.

Pynappl is the resulting library. It’s still very early in its evolution and so has lots of rough edges, gaps and rather dubious design choices. So far Pynappl’s feature set has been driven by the real applications I have been working with so there is a distinct bias towards data loading and management of stores. The Store class is the workhorse of the library and contains methods for loading RDF, running SPARQL queries, scheduling jobs and reading and writing of field/predicate maps and query profiles.

In keeping with my general philosophy for building RESTful applications, the HTTP based methods on the Store class make it very obvious that you are working with a fallible network by returning a tuple containing the HTTP reswponse and the body of the response. Its up to you to use or ignore the response as you see fit for your application. Many methods attempt to interpret the results of the method call but this can be switched off using an argument called “raw”. For example this code takes advantage of the interpretation and parsing of the SPARQL results:

store = pynappl.Store(store_uri, username, password)
(response, body) = store.select("select * where {?s a ?o} limit 10")
(sparql_header, results) = body
for result in results:
    print "%s (a %s)" % (str(result['s']), str(result['o']))

This can be switched off to get at the raw response body:

(response, body) = store.select("select * where {?s a ?o} limit 10", True)
print body

Also included is a command line application called tstore that wraps up a lot of these operations, including waiting for batch operations to complete. For example, to reset a store and load data into it takes just two lines:

./tstore --store mystore --user username --password xxxx reset --wait
./tstore --store mystore --user username --password xxxx store -f data.rdf

Please take a look at Pynappl and let me know what you think or of you’d like to get involved and help out.

About Pynappl… Pynappl is a simple open source Python library for working with the Talis Platform. Currently it is focussed mainly on managing data loading and manipulation of Talis Platform stores. Pynappl is an early alpha and is substantially incomplete (we’re looking for interested contributors. You can read more about Pynappl at its Google Code project page

Using Moriarty for Serving Linked Data

Although Moriarty is a general purpose library for building applications with the Talis Platform (and tried and tested in Talis Prism and Talis Aspire) one of the most common uses is simply to provide a browsable interface for linked data held in a Talis Platform store. Typically these scripts take the URI sent by the web browser and use a SPARQL query or the Talis Platform’s describe service to fetch linked data about that URI. They then style that as HTML or send it directly back as RDF. There are a series of technical details they all need to deal with: 303 redirects, content negotiation, converting RDF to HTML etc.

I’ve worked on more comprehensive libraries (e.g. Paget) to manage this kind of publishing but I thought the simple case of fetching and styling the data would make a good example of how to use Moriarty. I spent a bit of time this afternoon putting an example script together based on several I’ve written in the past. You can find the result in the dataspace subdirectory of the examples folder. That subdirectory contains four files:

  • dataspace.php — this is the example script. It contains the logic to fetch the relevant description, handle content negotiation of the best output format and styling the result appropriately. It’s not designed to be called directly, but to be included from a configuration file…
  • index.php — this is an example configuration file. It is designed to be dropped into a web server directory and then intercept all requests beginning with that URI. It contains the configuration describing which Talis Platform store to use, where cache files can be written and where to find Moriarty and ARC2. The last thing it does is to load dataspace.php which then handles the browser request.
  • sample.htaccess — this is a sample .htaccess file for Apache webservers. It redirects all requests via index.php.
  • plain.tmpl.html — this is the default template used to render the HTML views. This can be overridden in the configuration.

Using the example script is simple: just copy index.php to the root directory of your linked data space. If you’re using Apache then you need to copy sample.htaccess into the same directory and rename it to .htaccess. Edit index.php so it refers to your store and your URIs and that’s it! You can see it in action with the default template on my own linked data space.

About Moriarty… Moriarty is a simple PHP library for accessing the Talis Platform. It follows the Platform API very closely and wraps up many common tasks into convenient classes while remaining very lightweight. It also provides some simple RDF classes that are based on the excellent ARC2 class library. Moriarty is being developed by small community of developers and is in continual beta, subject to a slow stream of updates. You can read more about Moriarty on the n² wiki or visit its Google Code project

SPARQL 1.1 Early Access Features

In yesterday’s monthly Talis Platform release we started rolling out some early access support for the SPARQL 1.1 query language. We’ve been monitoring the activity around the development of SPARQL extensions for some time and have been watching the Working Group’s activity to get a feel for which new features are to be included in the forthcoming revision to the language. For those of you interested in some background on that then Lee Feigenbaum has a nice presentation that summarizes the working groups current thinking.

One major missing feature from SPARQL 1.0 was support for aggregates, i.e. the ability to count, sum and group results. These features have already been implemented by a number of triple stores and this work will get standardised as part of SPARQL 1.1. Because of our confidence in this feature being added to the specification; the existing implementation experience; and in response to customer feedback we have decided to release early access support for these specific features as an experimental enhancement to the Platform SPARQL endpoint.

The documentation on the developer wiki has been updated to start to itemize the supported SPARQL extensions.

Users should be aware that the syntax of the extensions may be subject to change as we’ll be attempting to track the progress of the working group as they clarify the specification of these features for inclusion in the standard. We’ll provide notice of any expected changes.

Users should also be aware that while the basic functionality of aggregates is supported in a number of other implementations, care should be taken if queries are intended to be portable across different triplestores and/or services. For example, the Talis Platform contains some mirrors of other datasets so queries written to use the new functionality may not be portable across other services due to the basic feature not being supported or due to minor syntactic differences.

With the warnings out of the way, here are some simple examples of the extensions in practice. The first query uses the BBC programmes and music data hosted in the platform, and asks for the number of albums release by the Prodigy. The query uses the count() function to count up the number of album titles. The results of the count are assigned to a variable called ?count in the SELECT clause using the new “SELECT expression” syntax.


#How many albums have been released by The Prodigy?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX rev: <http://purl.org/stuff/rev#>
SELECT (count(?title) as ?count) WHERE {
  ?group a mo:MusicGroup;
      foaf:name "The Prodigy";
       foaf:made ?album.
   ?album dc:title ?title.
}

Results.

The second example is a variant of one of the example queries that can be used against the Edubase data. In this case the query retrieves the number of schools closed in each parliamentary constituency in 2008, ordering the results in descending order. The new GROUP BY keyword is used to group the results by the label of the constituency.


#How many schools closed in each parliamentary constituency in 2008?
#In descending order of number of closures
prefix sch-ont:  <http://education.data.gov.uk/ontology/school#>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label (count(?school) as ?count) WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ;
     sch-ont:closeDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
GROUP BY ?label
ORDER BY DESC(?count)

Results.

We can revise this query to only include those constituencies in which at least 10 schools have closed. To do this we need to filter the results to just those where the count is equal to or greater than 10. The new HAVING keyword allows an expression to be applied to the result set before it is returned:


prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label (count(?school) as ?count) WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ;
     sch-ont:closeDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
GROUP BY ?label
HAVING (?count >= 10)
ORDER BY DESC(?count)

Results.

The SPARQL extensions page includes a few more examples of the syntax and a list of the operators now supported in the extended query language. Any feedback or questions, then please leave a comment below.

SPARQLing data.gov.uk: Transport Data

This is the second in my series of posts about using SPARQL to access the Linked Data being published from data.gov.uk. In the first article I looked at the Edubase data. In this second post I wanted to briefly look at some of the data from the Department of Transport. This dataset, which consists of around 45 million triples provides data about traffic counts on UK roads. Jeni Tennison has previously written up how she approached the dataset conversion and published it online as part of the data.gov.uk initiative, so her blog post is a useful starting point for background on the structure and content of the dataset.

The SPARQL endpoint for the transport data in data.gov.uk is at: http://services.data.gov.uk/transport/sparql.

Each of the road traffic monitoring points in the dataset has latitude and longitude details available, so it is possible to ask for all collection points that occur on a particular road. Here’s how to do that for the M5:


#List the uri, latitude and longitude for road traffic monitoring points on the M5
PREFIX road: <http://transport.data.gov.uk/0/ontology/roads#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX geo: <http://geo.data.gov.uk/0/ontology/geo#>
PREFIX wgs84: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?point ?lat ?long WHERE {
  ?x a road:Road.
  ?x road:number "M5"^^xsd:NCName.
  ?x geo:point ?point.
  ?point wgs84:lat ?lat.
  ?point wgs84:long ?long.
}

Results.

To modify the query to look at a different road, just change the query to refer to another road name, e.g. the B237 or the A4.

If you’d prefer not to deal with the SPARQL XML Results format, then you can add an parameter to the url to request the results in the SPARQL JSON results format (output=json). Here are the points on the A4 as JSON.

If you query further you can find all of the traffic counts associated with a particular location, each of these has a timestamp, the direction the traffic was travelling, etc. The data is ripe for visualisation, e.g. plotting the points on a map, building an animation to show traffic changes over time, etc.

The dataset also includes identifiers for different types of road and motor vehicle. These are published as SKOS concept schemes (i.e. a category of stuff). SKOS concept schemes are hierarchical, so lets see what schemes are in the data, and what their top concept is:


#List SKOS concept schemes, their top concepts and labels
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?scheme ?topconcept ?label WHERE {
  ?scheme a skos:ConceptScheme;
    skos:hasTopConcept ?topconcept.
  ?topconcept skos:prefLabel ?label.
}

Results.

The above query will work on any dataset as it just uses generic SKOS vocabulary. You could run it on any SPARQL endpoint to see if it contains some SKOS concept schemes.

One of the schemes in the dataset is a categorization of roads. Lets retrieve the concepts in that scheme:


PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?category ?label WHERE {
  ?category skos:inScheme ;
   skos:prefLabel ?label.
}

Results.

If we wanted to look at the concepts in the vehicle scheme (http://transport.data.gov.uk/0/category/vehicle), then we can just change the relevant URI in the query and retrieve the results.

Based on that information it should be possible to find traffic counts for specific types of vehicle on specific roads. I’ll leave that as an exercise for the reader!

SPARQLing data.gov.uk: Edubase Data

Last week the Cabinet Office issued a call for Open Data Developers to sign-up to get a preview of the forthcoming UK Government public data website. The site includes a directory of existing datasets plus a growing number of datasets that have been converted to RDF and which will shortly be available as Linked Data. This data is being stored in the Talis Platform providing developers with access to SPARQL endpoints as a means to query the data; we’ll also be including search and other access mechanisms at a later date.

In this series of postings I wanted to show some example SPARQL queries that can be used to access the data. If you’re new to SPARQL then you might want to look at Lee Feigenbaum’s SPARQL by Example tutorial, or my own short slide deck that covers all the basic syntax.

The first dataset I wanted to highlight is an extract of the Edubase dataset available from the Department of Children, Schools and Families. The conversion was carried out by the team at HP Labs and has been loaded into a Talis Platform store. The public facing SPARQL endpoint is available from: http://services.data.gov.uk/education/sparql.

Here are some sample SPARQL queries you can use against the data:


#1. Select the names of schools in the Administrative District of the City of London
# Ordering results by name of the school
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00AA> ;
}
ORDER BY ?name

Results


#2. Which schools in the BANES area have a nursery?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?name WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00HA> ;
     sch-ont:nurseryProvision "true"^^xsd:boolean
}
ORDER BY ?name

Results


#3. Select the names and addresses of schools in the Administrative District of the City of London
# Ordering results by name of the school
# Note: we use OPTIONAL here as not every school has an address listed in the data
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name ?address1 ?address2 ?postcode ?town WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00AA> .

  OPTIONAL {
   ?school sch-ont:address ?address .
  ?address sch-ont:address1 ?address1 ;
      sch-ont:address2 ?address2 ;
      sch-ont:postcode ?postcode ;
      sch-ont:town ?town .
  }
}
ORDER BY ?name

Results


#4. Select the name, lowest and highest age ranges, capacity and pupil:teacher ratio
# for all schools in the Bath & North East Somerset district
# Again we use OPTIONAL to allow for missing data items.
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name ?lowage ?highage ?capacity ?ratio WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00HA> .
     OPTIONAL {
       ?school sch-ont:statutoryLowAge ?lowage ;
     }

     OPTIONAL {
       ?school sch-ont:statutoryHighAge ?highage ;
     }

     OPTIONAL {
       ?school sch-ont:schoolCapacity ?capacity ;
     }

     OPTIONAL {
       ?school sch-ont:pupilTeacherRatio ?ratio
     }
}
ORDER BY ?name

Results


#5. What is the uri, name, and opening date of the oldest school in the UK?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date.
}
ORDER BY ASC(?date)
LIMIT 1

Results


#6. Select the name, easting and northing for the 100 newest schools in the UK.
# Can be used to plot them on a map
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date ?easting ?northing WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date ;
     sch-ont:easting ?easting ;
     sch-ont:northing ?northing .
}
ORDER BY DESC(?date)
LIMIT 100

Results


#7. Select the uri, name, easting and northing for all schools opened in 2008
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?date ?easting ?northing WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date ;
     sch-ont:easting ?easting ;
     sch-ont:northing ?northing .
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}

Results


#8. Select the uri, name, and the reason for closing for all schools that are currently
# scheduled for closure. The reason is a URI from a controlled vocabulary in the ontology.
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?reason WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Open_but_proposed_to_close ;
     sch-ont:reasonEstablishmentClosed ?reason .
}

Results


#9. In which parliamentary constituencies did schools close in 2008?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ;
     sch-ont:closeDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons

Results


#10. In which parliamentary constituencies did schools open in 2008?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:openDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons

Results

Hopefully that’s enough to get you started. If you want a bit more background on the modelling and a look at the ontology, then read this posting to the uk-government-data mailing list by Stuart Williams.

note: updated 16 Nov 2009 to reflect changes to the EduBase data. The first version of this dataset was created before the proposed guidelines for public sector URIs was published. The school ontology used in that first dataset had a URI of http://education.data.gov.uk/ontology/school# which has now been replaced with http://education.data.gov.uk/def/school/. Also the URIs for administrative districts were temporary placeholders containing the phrase “placeholder-id” in their path. These have now been updated to URIs based on the Office for National Statistics district codes, for example http://statistics.data.gov.uk/id/local-authority-district/00AA

Vocamp Glasgow 2009

This week saw the first Vocamp in Scotland, held at the University of Strathclyde, Glasgow.

Vocamp Glasgow 2009

Attendees came from a wide range of different and interesting problem-spaces and domains and gave a lot of great presentations on their work. The range was too broad, perhaps, for us to find enough commonality to collaborate on creating/fixing any vocabularies (the focus of the previous vocamps I’ve attended), but it was great to have together so many people with an interest in the semantic web in the locality, and the presentations were all really good.

Jeff Pan and Edward Thomas from Aberdeen University presented some great tutorials that covered a lot of ground, from RDFa, OWL2 and data-modeling methodology with Protegè.
Jeff Pan on OWL 2. (I especially liked the slide explaining how machines understand markup.)

Norman Gray and Stuart Chalmers presented their work on creating SKOS mappings between astronomy vocabularies.

Norman Gray on vocabulary mapping with SKOS

Jenny Ure from Edinburgh University talked about some of her work on the Socio-technical aspect of collaborative ontologies and knowledge systems.

Jenny Ure

Peter Winstanley talked about some of the data curated by the Scottish Government, and showcased Semantic Mediawiki for ontology development, and some different options for ontology visualisation.

Peter also pointed to the Communities Of Practice for local Government Scottish Group: Shared Representation using Semantic Technologies , inviting anyone with an interest in Semantic technologies to join and contribute to the discussion forums.

Peter Winstanley on Ontology visualisation and Scottish Gov Data

Serge Boucher from Brussels talked about some of the exciting possibilities for location and context-aware semantic web services.

Serge Boucher on Location Based Semantic Services

Gordon Dunsire from the Centre for Digital Library Research presented on vocabularies, standards, and linked data in the library domain, making particular mention of the dramatic tale of the development of the Library of Congress Subject Headings Dataset.

Gordon Dunsire on  Linked Data, vocabularies, and library metadata

Martin Dempster from University of Dundee presented his research into Assistive Technologies helping people that have difficulties talking to communicate, his use of ontologies to manage the data in his prototype system, and consuming data from popular social web 2.0 sites to generate conversational choices.

Martin Dempster on Semantic enhanced Assistive Technology

The event was hosted and facilitated by Paola Di Maio from the University of Strathclyde; thanks to Paola for organising the event, the university for laying on wifi and tea and coffee, and Talis for sponsoring the lunches.

Notes on Cross-Domain Ajax

Background

I asked for a little project I could get my teeth into, Leigh suggested something very tasty. An analytics app, along the lines of Google Analytics or the (very impressive) open source Piwik. Basically tracking things like page visits, referers, outbound clicks and so on. The difference from the existing apps being taking advantage of semweb goodness, specifically a Talis Platform store as a backend.

What this required was something that would run in the browser when someone visited a given Web page and pass on relevant data to a server which would push that data into the store. A script discretely embedded on the page of interest picks up the activity and posts it to the server-side logging system. There wasn’t really a sensible choice other than to use Javascript client-side, and to keep things reasonably portable server-side I opted for PHP. The server-side processing is relatively straightforward (although I’m not actually capturing much yet), but the browser-server comms part turned out to be a real doozy.

It’s not difficult to call a HTTP server from inside Javascript wrapped in HTML loaded in a browser. The snag is that the security model common to popular browsers blocks access to server domains other than the one that originated the page containing the Javascript. I got some code running from http://hyperdata.org that nicely delivered some basic logging of visits to pages on http://hyperdata.org (including the Wiki I have there – though it took a while to find the right template…). Problems started when I tried the same script in pages hosted under http://danny.ayers.name. Browser no likey, wrapping the server call in a try...catch block and throwing up an alert(error) always revealed Exception… “Access to restricted URI denied” code: “1012″ – this is the same origin policy. What follows are the workarounds for this. Googling the titles here will provide a variety of sample code that implements the solutions. I’ve opted for Hidden Form, it being straightforward for my purposes and standards-friendly.

Cross-domain proxy

Conceptually the easiest, this approach uses a server-side pipeline that lives on the same domain as the delivered pages containing the Javascript. It essentially echos calls from the delivering server to the remote server that does the work. This didn’t seem a good choice for the analytics app as every end-user would require such a proxy on their own server.

  • Pros: straightforward; independent of browser vagaries; spec friendly
  • Cons: needed for every host delivering pages with embedded scripts (if all the servers involved are yours, this is probably a good choice)

Tag Overload Hacks

When a typical browser hits HTML tags <script> and <img> (any others?) it will quite happily do a HTTP GET on them, irrespective of domain. There’s been a fair bit of finesse applied around the use of <script> – notably the elegant but brain-boiling JSONP (JSON with Padding) which passes around scripts padded to be non-executable and involves callbacks. Somehow. I won’t comment further on this, except to say I understood it for about 5 minutes then lost it again when I went to make a coffee. I’m told jQuery will do something similar automagically if you choose datatype: "json" and method: "get".

The <img> approach has been around seemingly forever – it’s also known as a Web Bug. Usually you have a 1×1 pixel image in the page of interest (probably inserted dynamically through DOM calls), every time the page is loaded that image’s URI gets a GET. The trick for tracking is to append the image URI with a bunch of query parameters and have your server intercept the GET call. Apparently this is how Google Analytics does its stuff.

  • Pros: good library support
  • Cons: limited to GETs; rather an ugly hack

Flash Proxy

Most people suggested this when I was asking around Twitter and the jQuery mailing list. Turns out there’s a really convenient library that does all the hard work (Google “flXHR”). But I’m afraid I prefer to give Flash a miss when there are open standards available, so I didn’t investigate.

  • Pros: easy (apparently) with library support
  • Cons: uses proprietary stuff

Hidden Form

When I first saw references to this I overlooked it – it seemed to demand an iFrame and ugly hackery. But then (largely thanks to this discussion of cross-domain Ajax) I realised it was almost certainly the best bet for the analytics app. Essentially you dynamically push a <form> into the HTML DOM with your data as input values, then call a form.submit(). Most references to this I found did involve an iFrame to receive the HTTP response – necessary if you’re doing a mashup or something, but not if you only need to POST data off to the server. In this latter circumstance you need to get the server to return a 204 No Content status code, but that’s trivial in PHP (header('HTTP/1.1 204 No Content');), otherwise the browser will try to load the target URI material.

  • Pros: supports and is very simple for POSTing to server; standards-friendly in this context
  • Cons: gets uglier if you want a response

I’ve not properly doc’d my app code yet (and the functionality is a very long way from complete, let alone tidied up), but you can find it all via my latest Wiki – there’s an example of the Javascript in test.html (just before the closing </body> tag). I’ve only tested it on Firefox so far, but I reckon there’s a good chance of the LazyWeb giving me solutions to any cross-browser issues.

Many thanks for all the helpful suggestions: from this thread on the jQuery mailing list and Twitterers @rjw @flensed @gridinoc @weblivz @JeniT @jQueryHowto.

I’d love to hear of any other solutions to cross-domain Ajax, please drop in comments, mail me or tweet me.