Subscribe

SPARQLing data.gov.uk: Edubase Data

Last week the Cabinet Office issued a call for Open Data Developers to sign-up to get a preview of the forthcoming UK Government public data website. The site includes a directory of existing datasets plus a growing number of datasets that have been converted to RDF and which will shortly be available as Linked Data. This data is being stored in the Talis Platform providing developers with access to SPARQL endpoints as a means to query the data; we’ll also be including search and other access mechanisms at a later date.

In this series of postings I wanted to show some example SPARQL queries that can be used to access the data. If you’re new to SPARQL then you might want to look at Lee Feigenbaum’s SPARQL by Example tutorial, or my own short slide deck that covers all the basic syntax.

The first dataset I wanted to highlight is an extract of the Edubase dataset available from the Department of Children, Schools and Families. The conversion was carried out by the team at HP Labs and has been loaded into a Talis Platform store. The public facing SPARQL endpoint is available from: http://services.data.gov.uk/education/sparql.

Here are some sample SPARQL queries you can use against the data:


#1. Select the names of schools in the Administrative District of the City of London
# Ordering results by name of the school
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00AA> ;
}
ORDER BY ?name

Results


#2. Which schools in the BANES area have a nursery?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?name WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00HA> ;
     sch-ont:nurseryProvision "true"^^xsd:boolean
}
ORDER BY ?name

Results


#3. Select the names and addresses of schools in the Administrative District of the City of London
# Ordering results by name of the school
# Note: we use OPTIONAL here as not every school has an address listed in the data
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name ?address1 ?address2 ?postcode ?town WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00AA> .

  OPTIONAL {
   ?school sch-ont:address ?address .
  ?address sch-ont:address1 ?address1 ;
      sch-ont:address2 ?address2 ;
      sch-ont:postcode ?postcode ;
      sch-ont:town ?town .
  }
}
ORDER BY ?name

Results


#4. Select the name, lowest and highest age ranges, capacity and pupil:teacher ratio
# for all schools in the Bath & North East Somerset district
# Again we use OPTIONAL to allow for missing data items.
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?name ?lowage ?highage ?capacity ?ratio WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:districtAdministrative
        <http://statistics.data.gov.uk/id/local-authority-district/00HA> .
     OPTIONAL {
       ?school sch-ont:statutoryLowAge ?lowage ;
     }

     OPTIONAL {
       ?school sch-ont:statutoryHighAge ?highage ;
     }

     OPTIONAL {
       ?school sch-ont:schoolCapacity ?capacity ;
     }

     OPTIONAL {
       ?school sch-ont:pupilTeacherRatio ?ratio
     }
}
ORDER BY ?name

Results


#5. What is the uri, name, and opening date of the oldest school in the UK?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date.
}
ORDER BY ASC(?date)
LIMIT 1

Results


#6. Select the name, easting and northing for the 100 newest schools in the UK.
# Can be used to plot them on a map
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date ?easting ?northing WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date ;
     sch-ont:easting ?easting ;
     sch-ont:northing ?northing .
}
ORDER BY DESC(?date)
LIMIT 100

Results


#7. Select the uri, name, easting and northing for all schools opened in 2008
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?date ?easting ?northing WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name;
     sch-ont:openDate ?date ;
     sch-ont:easting ?easting ;
     sch-ont:northing ?northing .
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}

Results


#8. Select the uri, name, and the reason for closing for all schools that are currently
# scheduled for closure. The reason is a URI from a controlled vocabulary in the ontology.
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?reason WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Open_but_proposed_to_close ;
     sch-ont:reasonEstablishmentClosed ?reason .
}

Results


#9. In which parliamentary constituencies did schools close in 2008?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ;
     sch-ont:closeDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons

Results


#10. In which parliamentary constituencies did schools open in 2008?
prefix sch-ont:  <http://education.data.gov.uk/def/school/>
prefix xsd:     <http://www.w3.org/2001/XMLSchema#>
prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
  ?school a sch-ont:School;
     sch-ont:establishmentName ?name ;
     sch-ont:openDate ?date ;
     sch-ont:parliamentaryConstituency ?cons .
  ?cons rdfs:label ?label.
  FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons

Results

Hopefully that’s enough to get you started. If you want a bit more background on the modelling and a look at the ontology, then read this posting to the uk-government-data mailing list by Stuart Williams.

note: updated 16 Nov 2009 to reflect changes to the EduBase data. The first version of this dataset was created before the proposed guidelines for public sector URIs was published. The school ontology used in that first dataset had a URI of http://education.data.gov.uk/ontology/school# which has now been replaced with http://education.data.gov.uk/def/school/. Also the URIs for administrative districts were temporary placeholders containing the phrase “placeholder-id” in their path. These have now been updated to URIs based on the Office for National Statistics district codes, for example http://statistics.data.gov.uk/id/local-authority-district/00AA

24 Responses

  1. Richard Masters Says:

    Hi Leigh

    Thanks for these examples. One (naive) question and one note:

    Question: These examples assume you know what nodes exist in the dataset. Can you use SPARQL to find out some basic information about the datasets that are available and the elements within them, and then create a SPARQL query from that? (Like Z39.50 which allows queries to be formulated without having to know anything about the target database.)

    Note: the posting by Stuart Williams is on a closed list.

    Thanks,
    Richard

  2. Leigh Dodds Says:

    Hi Richard,

    The Education ontology is now online here: http://education.data.gov.uk/ontology/school.rdf

    You can use SPARQL to explore a dataset to determine, e.g. how to find what classes and properties exist. There’s a nice blog post here which describes that process:

    http://dallemang.typepad.com/my_weblog/2008/08/rdf-as-self-describing-data.html

  3. n² » Blog Archive » SPARQLing data.gov.uk: Transport Data Says:

    [...] WordPress.org « SPARQLing data.gov.uk: Edubase Data [...]

  4. Dan Brickley (danbri) 's status on Tuesday, 13-Oct-09 21:37:38 UTC - Identi.ca Says:

    [...] http://blogs.talis.com/n2/archives/818 – nice set of sparql queries from Leigh at Talis, showing UK open govt datasets in action [...]

  5. PaulZH Says:

    Leigh,

    Are there other gov ontologies online?
    I’m particularly interested in an ontology for describing governmental organizations.

    Thanks.

    Paul

  6. links for 2009-10-14 « SkunkWorks? No – GovWonks! Says:

    [...] n² » Blog Archive » SPARQLing data.gov.uk: Edubase Data Last week the Cabinet Office issued a call for Open Data Developers to sign-up to get a preview of the forthcoming UK Government public data website. The site includes a directory of existing datasets plus a growing number of datasets that have been converted to RDF and which will shortly be available as Linked Data. This data is being stored in the Talis Platform providing developers with access to SPARQL endpoints as a means to query the data; we’ll also be including search and other access mechanisms at a later date. [...]

  7. Getting Started with data.gov.uk, Triplr SPARYQL and Yahoo Pipes « OUseful.Info, the blog… Says:

    [...] So what can we do with it? Not being particularly fluent in SPARQL, I had a poke around for some examples I could cut, paste, hack and tinker with and found a few nice examples on the [n]^2 blog: SPARQLing data.gov.uk: Edubase Data [...]

  8. n² » Blog Archive » SPARQL 1.1 Early Access Features Says:

    [...] second example is a variant of one of the example queries that can be used against the Edubase data. In this case the query retrieves the number of schools closed in each parliamentary constituency [...]

  9. Leigh Dodds Says:

    Hi Paul,

    I’m not aware of other ontologies that are yet online. I do know there’s a lot of behind the scenes activity exploring how best to model various datasets. I’m not sure whether that will cover modelling governmental departments though. Could you use FOAF, e.g. foaf:Organization, as a simple way to define departments, their names, etc?

    Its worth raising this on the government data developers list too.

    Cheers,

    L.

  10. Pezholio » Blog Archive » Adventures in SPARQL Says:

    [...] also a few more example queries on the Talis Blog, so if you’re that way inclined I heartily recommend having a [...]

  11. Raw Data Now in the UK update | alexmikro.net Says:

    [...] be queried via SPARQL. You can find some useful examples of such queries from Leigh Dodds at the Talis blog. Happy [...]

  12. Panlibus » Blog Archive » Will Linked Data mean an early end for Marc & RDA Says:

    [...] they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples.   In a similar vein, metadata from BBC programmes and music is being harvested in to [...]

  13. Mapping Recent School Openings and Closures « OUseful.Info, the blog… Says:

    [...] (I cribbed how to write these queries from a Talis blog: SPARQLing data.gov.uk: Edubase Data;-) [...]

  14. Nodalities » Blog Archive » data.gov.uk and the Talis Platform Says:

    [...] Platform developer blog we’ve begun showing some ways that the initial datasets, covering UK schools and traffic measurements can be queried in interesting ways. Its been exciting to see people begin [...]

  15. Dave Grosvenor Says:

    The location of the schools as northings and eastings do not seem to be present in the data anymore!
    Any explanation?

    Thanks

  16. Leigh Dodds Says:

    Hi Dave,

    The Easting and Northing values are still in the dataset, e.g:

    http://education.data.gov.uk/doc/school/120805

    The new properties are:

    http://data.ordnancesurvey.co.uk/ontology/spatialrelations/easting
    http://data.ordnancesurvey.co.uk/ontology/spatialrelations/northing

    There are also latitude and longitude values in the data now using these properties:

    http://www.w3.org/2003/01/geo/wgs84_pos#lat
    http://www.w3.org/2003/01/geo/wgs84_pos#long

    HtH,

    L.

  17. First Dabblings With Pipelinked Linked Data « OUseful.Info, the blog… Says:

    [...] to get started, let’s grab a list of schools… The Talis blog post SPARQLing data.gov.uk: Edubase Data contains several example queries over the education datastore. The query I’ll use is derived [...]

  18. Barncroft, Battins & Bondfields @ Plings Blogs Says:

    [...] including the National Transport Access Node database (read: where the bus stops all are), the EduBase list of schools now available as linked open data, and with more easily accessible information from Ordinance [...]

  19. Linked & Local Open Data – What could it mean for positive activity information? @ Plings Blogs Says:

    [...] example, central government efforts have already seen the EduBase database of schools and national transport data made available through the beta Data.gov.uk website due to go live in [...]

  20. JamesC Says:

    Hiya,

    When I try to run some of the results, like say number 7, by clicking on ‘results’ I get only the empty elements returned.

    -JamesC

  21. dave Says:

    Great article! I’m attempting to extract info for a government project, but i’m running into difficulties because im’ learning by example, but can’t find an example of what I need to do!

    I’m trying to get a list of schools from a given district (in the example below, 00BA) and only return those which are NOT nurseries or preschools (identified by the Type “TypeOfEstablishment_EY_Setting”).

    I’ve managed to hack together the query below, but its returning duplicate results because most schools are listed under 2 Types, only one of which will be flagged as “TypeOfEstablishment_EY_Setting”.

    Am i meant to somehow re-filter what i’ve got so far, or can I limit the results returned initially?

    prefix sch-ont:
    prefix geo:
    prefix sch-type:
    SELECT ?school ?name ?date ?lat ?long ?capacity ?type WHERE {
    ?school a sch-ont:School;
    sch-ont:establishmentName ?name;
    sch-ont:openDate ?date;
    geo:lat ?lat;
    geo:long ?long;
    sch-type:type ?type;
    sch-ont:districtAdministrative
    .

    OPTIONAL {
    ?school sch-ont:schoolCapacity ?capacity
    }
    }

  22. Lecter Roux Says:

    Hi Leigh,

    Do you have any other ontology samples? I can’t access the data from Stuart Williams.
    Thanks for sharing this! This will definitely be a big help for my research.

  23. Data Entry Says:

    Question for you, what sort of hardware was/is used to run these environments? Being a bit of a server junky I am always curious on what sort of boxes these large DB’s reside on. Cheers

  24. Sam Tunnicliffe Says:

    The Platform runs across a variety of hardware, in multiple datacenters. Some pieces are hosted in our colos, on pretty much commodity hardware. We also use cloud services like Amazon EC2, S3 & SimpleDB for various tasks.

Leave a Reply