SPARQLing data.gov.uk: Edubase Data
Last week the Cabinet Office issued a call for Open Data Developers to sign-up to get a preview of the forthcoming UK Government public data website. The site includes a directory of existing datasets plus a growing number of datasets that have been converted to RDF and which will shortly be available as Linked Data. This data is being stored in the Talis Platform providing developers with access to SPARQL endpoints as a means to query the data; we’ll also be including search and other access mechanisms at a later date.
In this series of postings I wanted to show some example SPARQL queries that can be used to access the data. If you’re new to SPARQL then you might want to look at Lee Feigenbaum’s SPARQL by Example tutorial, or my own short slide deck that covers all the basic syntax.
The first dataset I wanted to highlight is an extract of the Edubase dataset available from the Department of Children, Schools and Families. The conversion was carried out by the team at HP Labs and has been loaded into a Talis Platform store. The public facing SPARQL endpoint is available from: http://services.data.gov.uk/education/sparql.
Here are some sample SPARQL queries you can use against the data:
#1. Select the names of schools in the Administrative District of the City of London
# Ordering results by name of the school
prefix sch-ont: <http://education.data.gov.uk/def/school/>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00AA> ;
}
ORDER BY ?name
#2. Which schools in the BANES area have a nursery?
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00HA> ;
sch-ont:nurseryProvision "true"^^xsd:boolean
}
ORDER BY ?name
#3. Select the names and addresses of schools in the Administrative District of the City of London
# Ordering results by name of the school
# Note: we use OPTIONAL here as not every school has an address listed in the data
prefix sch-ont: <http://education.data.gov.uk/def/school/>
SELECT ?name ?address1 ?address2 ?postcode ?town WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00AA> .
OPTIONAL {
?school sch-ont:address ?address .
?address sch-ont:address1 ?address1 ;
sch-ont:address2 ?address2 ;
sch-ont:postcode ?postcode ;
sch-ont:town ?town .
}
}
ORDER BY ?name
#4. Select the name, lowest and highest age ranges, capacity and pupil:teacher ratio
# for all schools in the Bath & North East Somerset district
# Again we use OPTIONAL to allow for missing data items.
prefix sch-ont: <http://education.data.gov.uk/def/school/>
SELECT ?name ?lowage ?highage ?capacity ?ratio WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://statistics.data.gov.uk/id/local-authority-district/00HA> .
OPTIONAL {
?school sch-ont:statutoryLowAge ?lowage ;
}
OPTIONAL {
?school sch-ont:statutoryHighAge ?highage ;
}
OPTIONAL {
?school sch-ont:schoolCapacity ?capacity ;
}
OPTIONAL {
?school sch-ont:pupilTeacherRatio ?ratio
}
}
ORDER BY ?name
#5. What is the uri, name, and opening date of the oldest school in the UK?
prefix sch-ont: <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date.
}
ORDER BY ASC(?date)
LIMIT 1
#6. Select the name, easting and northing for the 100 newest schools in the UK.
# Can be used to plot them on a map
prefix sch-ont: <http://education.data.gov.uk/def/school/>
SELECT ?school ?name ?date ?easting ?northing WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date ;
sch-ont:easting ?easting ;
sch-ont:northing ?northing .
}
ORDER BY DESC(?date)
LIMIT 100
#7. Select the uri, name, easting and northing for all schools opened in 2008
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?date ?easting ?northing WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date ;
sch-ont:easting ?easting ;
sch-ont:northing ?northing .
FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
#8. Select the uri, name, and the reason for closing for all schools that are currently
# scheduled for closure. The reason is a URI from a controlled vocabulary in the ontology.
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?school ?name ?reason WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name ;
sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Open_but_proposed_to_close ;
sch-ont:reasonEstablishmentClosed ?reason .
}
#9. In which parliamentary constituencies did schools close in 2008?
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name ;
sch-ont:establishmentStatus sch-ont:EstablishmentStatus_Closed ;
sch-ont:closeDate ?date ;
sch-ont:parliamentaryConstituency ?cons .
?cons rdfs:label ?label.
FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons
#10. In which parliamentary constituencies did schools open in 2008?
prefix sch-ont: <http://education.data.gov.uk/def/school/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?cons ?label WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name ;
sch-ont:openDate ?date ;
sch-ont:parliamentaryConstituency ?cons .
?cons rdfs:label ?label.
FILTER (?date > "2008-01-01"^^xsd:date && ?date < "2009-01-01"^^xsd:date)
}
ORDER BY ?cons
Hopefully that’s enough to get you started. If you want a bit more background on the modelling and a look at the ontology, then read this posting to the uk-government-data mailing list by Stuart Williams.
note: updated 16 Nov 2009 to reflect changes to the EduBase data. The first version of this dataset was created before the proposed guidelines for public sector URIs was published. The school ontology used in that first dataset had a URI of http://education.data.gov.uk/ontology/school# which has now been replaced with http://education.data.gov.uk/def/school/. Also the URIs for administrative districts were temporary placeholders containing the phrase “placeholder-id” in their path. These have now been updated to URIs based on the Office for National Statistics district codes, for example http://statistics.data.gov.uk/id/local-authority-district/00AA


October 6th, 2009 at 4:44 pm
Hi Leigh
Thanks for these examples. One (naive) question and one note:
Question: These examples assume you know what nodes exist in the dataset. Can you use SPARQL to find out some basic information about the datasets that are available and the elements within them, and then create a SPARQL query from that? (Like Z39.50 which allows queries to be formulated without having to know anything about the target database.)
Note: the posting by Stuart Williams is on a closed list.
Thanks,
Richard
October 10th, 2009 at 1:45 pm
Hi Richard,
The Education ontology is now online here: http://education.data.gov.uk/ontology/school.rdf
You can use SPARQL to explore a dataset to determine, e.g. how to find what classes and properties exist. There’s a nice blog post here which describes that process:
http://dallemang.typepad.com/my_weblog/2008/08/rdf-as-self-describing-data.html
October 10th, 2009 at 2:26 pm
[...] WordPress.org « SPARQLing data.gov.uk: Edubase Data [...]
October 13th, 2009 at 9:37 pm
[...] http://blogs.talis.com/n2/archives/818 – nice set of sparql queries from Leigh at Talis, showing UK open govt datasets in action [...]
October 14th, 2009 at 6:54 am
Leigh,
Are there other gov ontologies online?
I’m particularly interested in an ontology for describing governmental organizations.
Thanks.
Paul
October 14th, 2009 at 11:07 pm
[...] n² » Blog Archive » SPARQLing data.gov.uk: Edubase Data Last week the Cabinet Office issued a call for Open Data Developers to sign-up to get a preview of the forthcoming UK Government public data website. The site includes a directory of existing datasets plus a growing number of datasets that have been converted to RDF and which will shortly be available as Linked Data. This data is being stored in the Talis Platform providing developers with access to SPARQL endpoints as a means to query the data; we’ll also be including search and other access mechanisms at a later date. [...]
October 20th, 2009 at 8:43 am
[...] So what can we do with it? Not being particularly fluent in SPARQL, I had a poke around for some examples I could cut, paste, hack and tinker with and found a few nice examples on the [n]^2 blog: SPARQLing data.gov.uk: Edubase Data [...]
October 20th, 2009 at 9:56 am
[...] second example is a variant of one of the example queries that can be used against the Edubase data. In this case the query retrieves the number of schools closed in each parliamentary constituency [...]
October 20th, 2009 at 11:02 am
Hi Paul,
I’m not aware of other ontologies that are yet online. I do know there’s a lot of behind the scenes activity exploring how best to model various datasets. I’m not sure whether that will cover modelling governmental departments though. Could you use FOAF, e.g. foaf:Organization, as a simple way to define departments, their names, etc?
Its worth raising this on the government data developers list too.
Cheers,
L.
October 21st, 2009 at 2:52 pm
[...] also a few more example queries on the Talis Blog, so if you’re that way inclined I heartily recommend having a [...]
October 25th, 2009 at 9:49 am
[...] be queried via SPARQL. You can find some useful examples of such queries from Leigh Dodds at the Talis blog. Happy [...]
October 26th, 2009 at 1:13 am
[...] they have in mind. If you want a sneak preview of how such data is queried, take a look at some of theses examples. In a similar vein, metadata from BBC programmes and music is being harvested in to [...]
October 26th, 2009 at 8:58 am
[...] (I cribbed how to write these queries from a Talis blog: SPARQLing data.gov.uk: Edubase Data;-) [...]
November 17th, 2009 at 6:19 pm
[...] Platform developer blog we’ve begun showing some ways that the initial datasets, covering UK schools and traffic measurements can be queried in interesting ways. Its been exciting to see people begin [...]
November 21st, 2009 at 9:09 pm
The location of the schools as northings and eastings do not seem to be present in the data anymore!
Any explanation?
Thanks
November 23rd, 2009 at 9:58 am
Hi Dave,
The Easting and Northing values are still in the dataset, e.g:
http://education.data.gov.uk/doc/school/120805
The new properties are:
http://data.ordnancesurvey.co.uk/ontology/spatialrelations/easting
http://data.ordnancesurvey.co.uk/ontology/spatialrelations/northing
There are also latitude and longitude values in the data now using these properties:
http://www.w3.org/2003/01/geo/wgs84_pos#lat
http://www.w3.org/2003/01/geo/wgs84_pos#long
HtH,
L.
December 15th, 2009 at 8:43 pm
[...] to get started, let’s grab a list of schools… The Talis blog post SPARQLing data.gov.uk: Edubase Data contains several example queries over the education datastore. The query I’ll use is derived [...]
December 17th, 2009 at 3:10 pm
[...] including the National Transport Access Node database (read: where the bus stops all are), the EduBase list of schools now available as linked open data, and with more easily accessible information from Ordinance [...]
December 21st, 2009 at 10:52 am
[...] example, central government efforts have already seen the EduBase database of schools and national transport data made available through the beta Data.gov.uk website due to go live in [...]
March 5th, 2010 at 6:25 pm
Hiya,
When I try to run some of the results, like say number 7, by clicking on ‘results’ I get only the empty elements returned.
-JamesC
March 11th, 2010 at 5:01 pm
Great article! I’m attempting to extract info for a government project, but i’m running into difficulties because im’ learning by example, but can’t find an example of what I need to do!
I’m trying to get a list of schools from a given district (in the example below, 00BA) and only return those which are NOT nurseries or preschools (identified by the Type “TypeOfEstablishment_EY_Setting”).
I’ve managed to hack together the query below, but its returning duplicate results because most schools are listed under 2 Types, only one of which will be flagged as “TypeOfEstablishment_EY_Setting”.
Am i meant to somehow re-filter what i’ve got so far, or can I limit the results returned initially?
prefix sch-ont:
prefix geo:
prefix sch-type:
SELECT ?school ?name ?date ?lat ?long ?capacity ?type WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:openDate ?date;
geo:lat ?lat;
geo:long ?long;
sch-type:type ?type;
sch-ont:districtAdministrative
.
OPTIONAL {
?school sch-ont:schoolCapacity ?capacity
}
}
April 16th, 2010 at 8:49 am
Hi Leigh,
Do you have any other ontology samples? I can’t access the data from Stuart Williams.
Thanks for sharing this! This will definitely be a big help for my research.
July 13th, 2010 at 1:24 am
Question for you, what sort of hardware was/is used to run these environments? Being a bit of a server junky I am always curious on what sort of boxes these large DB’s reside on. Cheers
July 13th, 2010 at 7:48 am
The Platform runs across a variety of hardware, in multiple datacenters. Some pieces are hosted in our colos, on pretty much commodity hardware. We also use cloud services like Amazon EC2, S3 & SimpleDB for various tasks.