Searching the BBC Data in the Talis Platform
I’ve previously blogged about how easy it is to create a custom search index using the Platform. So obviously during the process of loading the BBC programmes and music data into the Platform we’ve used this feature to build a search engine across their data.
In this post I wanted to show a few example queries and then review how we’ve configured the search indexes so you can not only get the most from the feature, but also see how it can be used against real-world data.
Sample Queries
Here are some sample queries. The Platform is more of a search engine tool-kit than a search engine per se: the results aren’t a human-readable web page, they’re an RSS 1.0 document that contains enough structured metadata about each item in order to build a presentation of the results. And where additional metadata is required, this can be extracted using the describe service, additional searches, augmentation or a SPARQL query.
However for the purposes of this article its enough to view the example in your browser. Application developers will want to dig into the underlying markup to see what extra data is included.
- A search for “Banksy“
- A search for “The Prodigy” — returning the artist, the dbpedia entry, and episode titles and descriptions in which they are mentioned
- A search for “Terry Pratchett” — again produces a mixture of different types.
- A search for “Prodigy” limiting to things that are of type “”http://purl.org/stuff/rev#Review” — Results.
- A facetted search for “Prodigy” grouping the results based on their RDF type — Results. This shows us that we have results in not only episodes but in a variety of other types too. We can drill down these into form the following search:
- A search for “Prodigy” limits to Music Segments. Results.
If you want to try out your own queries, then use this simple form.
The Configuration
To show how we’ve configured the Field Predicate Map and Query Profile for the BBC Backstage store, I’ve uploaded them to our public SVN: fmap.rdf and queryprofile.rdf
Looking at the Field Predicate Map, you can see we’ve configured the Platform store to index the key predicates in the BBC data, including titles, labels, descriptions and synopses. You can use any of the named fields in the configuration to refine searches to specific predicates in the data, allowing construction of an “advanced search form”. E.g. we can search for name:”Stephen Fry” to search for a person called Stephen Fry (results).
The RDF type property is also included in the Field Predicate Map to allow us to limit searches to particular types of resource, it also enables us to do facetted searches based on type, giving us an alternate view of the data. Its easy to see how that functionality could be used to help build some useful additional options to restrict the search results presented in a user interface.
To configure the relevance ranking we chosen to boost hits in “labels” (names, labels, titles) over “descriptions” (description, synopses, review). We could easily change the boosting to favour one or other type of predicate to further tweak the results. But this configuration provides a reasonable set of search results for the tests we’ve done. Let us know how you get on and whether you think any of this should be changed. We’re happy to alter the configuration to make sure that people can get the most from the BBC data.

