Subscribe

Archive for the 'Announcements' Category

Pho 0.4 — Job Control and Command-Line Tool

I’ve just uploaded the latest release of Pho, the Ruby client for the Talis Platform. The primary changes in the 0.4 release is a reworking of the code relating to Snapshots and Jobs to provide access to the detailed job lifecycle data that was added in Release 21 of the Platform.

Because jobs are executed asynchronously, the API now includes code to wait for a job to finish, with an option to monitor the progress updates.

For example the follow code fragment illustrates how to submit a reindex job, and then report on its progress:


store = Pho::Store.new("http://api.talis.com/stores/my-store", "user", "pass")
resp = Pho::Jobs.submit_reindex(store)
job = Pho::Jobs.wait_for_submitted(resp, store) do | job, message, time |
  puts "#{time} #{message}"
end

The wait_for_submitted method takes the response from submitting a job to the Platform and will then poll the API to wait until the job has completed. When the method returns it returns a Job object populated with the progress updates; its possible to also determine if the job was successful or not. If you pass a block to the method then the code will be called for each newly encountered progress update, including the start message and completion message.

This release also includes a fledgling command-line application for working with a Platform Store. Once you’ve installed the job you’ll have a new script called talis_store which you can use to interact with a platform store. There’s still some work to be done to make it nicer to use, but it covers the majority of the core operations already. To get help on this run:

talis_store help

As a quick example, here’s how to take a snapshot of a store and then download that snapshot to a local directory. In the process of downloading the snapshot the MD5 will be automatically verified:


talis_store backup -u user -p pass -s my-store -d ~/backups

For the next release I’m planning to add support for Changesets, which is format for describing changes to RDF graphs.

Pho: A Ruby Client for the Talis Platform

This is a short blog post to announce a project I’ve been working on in my spare time. Pho, is a Ruby client for the Talis Platform. Its hosted on Rubyforge so getting started couldn’t be easier:

gem install pho

Will download and install the Pho gem, along with the documentation which you can also read online.

The distribution comes with a couple of example scripts that show how to add items to the Content Box, perform SPARQL queries, check status of a store, etc.

However the API currently does a lot more than that giving you full access to all of the core Platform services including: storing binary data & RDF metadata, SPARQL queries, faceted browsing, job control, store configuration options, etc.

There’s still plenty of work to be done but at version 0.3 I think there’s enough functionality available that you can build useful applications using the API. For example there’s sufficient code there now to use the library to script some simple data management activities for publishing or managing data in the Platform, or to build a simple linked data browser. I hope to post examples of doing exactly those things over the next few weeks, as I’m planning some updates to my space data store (briefly described here) that will be handled using Pho.

The next steps are to plug some of the gaps in the API  — specifically parsing of search results, access to the job metadata we exposed in version 21, and better support for changeset management. I’m also going to explore some simple Ruby-RDF mapping functionality. The latter should help turn Pho into something that can provide the core functionality required for building linked data backed applications.

I’d love to get feedback on this, so feel free to post bug reports or feature suggestions either on the Rubyforge project or the n2-dev mailing list.

Metamorph Open Source project for Semantic Converter Web Service

I’ve published the code behind the Talis Convert Service (production release at stable URL coming soon) as an open source project on Google Code, called Metamorph .

Metamorph is a service aimed at semantic web developers. It is much like triplr, babel, swignition and any23 (please leave a comment pointing to any other similar services).

You give it a(n http) URI, an (optional) input format, and an output format, and it will fetch the document from the web, and convert it into the output format.

Understood input values include:

  • Semantic HTML (RDFa, eRDF, microformats, POSH)
  • RDF (XML, Turtle, JSON)
  • SPARQL-XML
  • Facet XML (the response format of the facets service available on all platform stores)

Output for all input formats can be:

  • JSON
  • JSONP
  • HTML

If the input is some form of RDF, you can also ask for:

  • RDF (XML, Turtle, JSON, - and the default HTML is rendered as RDFa)
  • RSS 1.0
  • TriX
  • Exhibit (web page, JSON, JSONP)

In addition, if the input is an RDF format, you can specify multiple data URIs, and the results will be merged in the output document. For instance, this conversion merges data from two of my homepages, and a Turtle file.

I’m thinking about removing the TriX output, as I’m not sure it would be used by anyone - the reason I didn’t bother to write a parser for it was because I haven’t seen any data published as TriX in the first place.

I welcome any input on what else would be useful from this web service. I suspect that more output options, while fairly easy to add, would not be very useful. More input options may be useful, but perhaps not significantly so.

I suspect what might be more useful, and more likely to distinguish this from similar RDF converter services, are graph transformation services, which might include:

  • Diffs
  • Intersects
  • Smushing
  • Augmenting on property and class type URIs with labels and comments, perhaps retrieved from SchemaCache

Metamorph is coded in PHP, and uses ARC for parsing RDF and HTML, and serialising RDF/XML and Turtle.

Please use the issue tracker for raising any bugs or feature requests.

Opening for a Senior Platform Developer

We have an opening for a Senior Developer at Talis in the Platform development group. Talis is a mature and solid business based in the UK and provides a unique mix of loyal customers, amazing innovation and a focus on the long term.

Our platform development group is responsible for making sure that the Talis Platform is the premier environment for developing and delivering great Semantic Web applications. We need your help in designing and building our infrastructure to support hundreds of thousands of users and their data. We’re looking for people who:

  • use their code to communicate their ideas clearly
  • are proficient in Java and comfortable in Python, PHP and other scripting languages
  • can break dependencies and decompose hard problems into simpler ones
  • never forget about scalability, performance and security
  • prefer to develop test first
  • have spent time modelling data in RDF
  • can develop solutions to problems, communicate them to the team and get them implemented quickly
  • aren’t afraid to ask questions
  • have implemented HTTP clients and servers
  • like to say “let’s try it” and “we can do that”
  • understand how to balance perfection with reality
  • are as happy to lead as to follow
  • know when to reuse and when to start afresh
  • can tell us about something new they learned this year

How to apply:

Take a look at the problems below and select two to answer. Please send us your C.V and an application
letter including your answers to careers@talis.com

  • The Web can be modelled as a network of nodes labelled with URLs and connected by directed arcs. Suppose we want to find all the URLs linked to and from any given URL, and all the URLs that are linked from any two given URLs. What kind of data structures might be suitable for representing and querying a network with 10^8 nodes each having between 10 and 50 arcs?
  • Discuss the different types of automated testing that are needed to maintain high quality software. What kinds of programming language are best suited to each type of testing? What techniques could be used for testing asynchronous processes and for processes that operate over large volumes of data? Are there any situations that you wouldn’t test?
  • Large-scale systems composed of many cooperating application servers often need to share and cache configuration. Suppose any server can initiate changes that need to be reflected in real time to the other application servers in the cluster. What strategies could you use for coordinating this kind of behaviour and how are they tolerant to various failure conditions?

voiD: a Vocabulary of Interlinked Datasets

As technological advances allow the production and dissemination of information to scale out, old methods for navigating the information become inadequate, and we need new means to cope with the greater scale of information available.

With the rise of printing in the 16th century, library collections flourished, making more ideas and information available to more scholars than ever before. Yet to know what books a library contained, scholars had to either physically visit the library (and browse the shelves, or consult a manuscript catalogue), or make enquiries by letter.

Frontpiece of the first printed library catalogue

In 1595, Leiden University innovated by becoming the first institution to make their library’s catalogue available in print. Just as printing had made the editions within a library far more widely available, printing a book about the library’s collection, brought awareness of the library and its contents to a greater audience. Now, scholars all across Europe could tell if Leiden University’s library had the information they needed. Scholars had more information about what books were available, and Leiden’s international reputation was bolstered. Other libraries followed suit by printing their own catalogues, and those library catalogues could be collected. Scholars could compare the strengths and purposes of multiple libraries from a single location.

When the Linked Open Data movement began gaining ground in 2007, there were relatively few large RDF datasets available on the web. If you followed the right blogs and mailing lists, you knew which datasets were available. As the LOD Cloud grows (and manually drawing it becomes less and less practical), it becomes apparent that the number of datasets is outgrowing our methods for discovering them. Just as it made sense for libraries in the 16th century to use the technology of print to publish descriptions of their collections, it is natural to use RDF to publish descriptions of datasets available on the web. Just as printed catalogues brought library collections to new audiences, and enabled new uses, RDF descriptions will bring datasets to new audiences (machines!), making them more findable, and enabling new uses. All you need is the vocabulary to describe datasets with.

voiD interlinking dataset diagram

voiD is a vocabulary dataset publishers can use to describe their datasets: their subject areas, their access mechanisms (eg: APIs, SPARQL endpoints, data dumps), their licensing, their provenance, how they link to other datasets, which vocabularies are used within them, and statistics relating to their contents.

As well as the vocabulary, there is the voiD guide, where the authors of voiD (Jun Zhao, Michael Hausenblas, Richard Cyganiak, and myself [Keith Alexander] ) explain how to create voiD descriptions combining terms from voiD with other useful vocabularies, publish voiD, and query voiD.

Feedback on both the vocabulary, and the Guide, will be gratefully received at void-rdfs-internals@googlegroups.com.

Moriarty Development List

I noticed that I was the only one getting notificiations of commits to Moriarty’s subversion. I thought the best way to fix that was to create a Google group for moriarty and ensure the commit reports get sent there. So if you’re interested in keeping track of changes to Moriarty please sign up: moriarty-dev

Openings For Senior Developers

Update 18 Feb 2009: we have an additional requirement for a platform developer, see the announcement

We have some openings for Senior Developers at Talis in the Platform development group. Talis is a mature and solid business based in the UK and provides a unique mix of loyal customers, amazing innovation and a focus on the long term.

Our platform development group is responsible for making sure that the Talis Platform is the premier environment for developing and delivering great Semantic Web applications. We need your help in designing and building our infrastructure to support hundreds of thousands of users and their data. We’re looking for people who:

  • use their code to communicate their ideas clearly
  • are proficient in Java and comfortable in Python, PHP and other scripting languages
  • can break dependencies and decompose hard problems into simpler ones
  • never forget about scalability, performance and security
  • prefer to develop test first
  • have spent time modelling data in RDF
  • can develop solutions to problems, communicate them to the team and get them implemented quickly
  • aren’t afraid to ask questions
  • have implemented HTTP clients and servers
  • like to say “let’s try it” and “we can do that”
  • understand how to balance perfection with reality
  • are as happy to lead as to follow
  • know when to reuse and when to start afresh
  • can tell us about something new they learned this year

How to apply:

Take a look at the problems below and select two to answer. Please send us your C.V and an application
letter including your answers to careers@talis.com

  • The Web can be modelled as a network of nodes labelled with URLs and connected by directed arcs.
    Suppose we want to find all the URLs linked to and from any given URL, and all the URLs that are
    linked from any two given URLs. What kind of data structures might be suitable for representing and
    querying a network with 10^8 nodes each having between 10 and 50 arcs?
  • Discuss the different types of automated testing that are needed to maintain high quality software.
    What kinds of programming language are best suited to each type of testing? What techniques could be
    used for testing asynchronous processes and for processes that operate over large volumes of data? Are
    there any situations that you wouldn’t test?
  • Large-scale systems composed of many cooperating application servers often need to share and cache
    configuration. Suppose any server can initiate changes that need to be reflected in real time to the other
    application servers in the cluster. What strategies could you use for coordinating this kind of behaviour
    and how are they tolerant to various failure conditions?

Talis Store Plugin for ARC

The PHP coders amongst you may be interested in a Talis Store Plugin. To install it:

cd arc/plugins #yoru ARC plugins directory

svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/ talis
svn co http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/ARC2_SPARQLSerializerPlugin/ARC2_SPARQLSerializerPlugin.php ARC2_SPARQLSerializerPlugin.php

Then to use it:

require_once '../ARC2.php';   

/* configuration */
$talis_config = array(
  // 'db_user' => 'your_username',
  // 'db_pwd' => 'your_password',
  'store_name' => 'kwijibo-dev3', // your store name
   'fetch_graphs' => false, // If set to true, using FROM will fetch the graph as a datasource over the web, and store it in /meta
);
$store = ARC2::getComponent('Talis_StorePlugin', $talis_config);
$store->query("LOAD ")

What this does is let you use a Talis store instead of the ARC mysql store. It supports a subset of ARC’s SPARQL+ functionality. Specifically, it supports INSERT and DELETE (which I could translate to Changesets thanks to Benji’s SPARQL parser), but not the aggregate functions (which I don’t see a way to support in a client-layer at this point).

Some differences:

Named Graphs are currently a bit different in Talis stores - you can’t (yet) create your own on the fly as you can with ARC, so LOAD will put the data into the public graph by default.

Talis platform transforms bnodes into URIs, so .

I also added a few methods to the api:

$store->import($arc_store);
$store->export($arc_store);

(The idea is that you can move data between an ARC store and a Talis store).

I also added a $store->change($before_rdf, $after_rdf) method for submitting changes to an RDF graph.

It’s quite interesting comparing the two different ways of making changes (changesets and SPARQL+). I think that changesets (especially with the coming Batch Changeset support) are maybe a bit more amenable to programmatic resource updates from forms and the like. However, changesets are a bit verbose to hand-write for making quick edits and testing stuff, or pattern-based changes, and I’m finding SPARQL+ really handy for stuff like this.

What I’ve been thinking would be pretty neat would be if the SPARQL parser could be a bit more user extensible, and pre-query hooks could be set up (like ARC’s triggers, which happen post-query), so that plugin/hook writers could extend the SPARQL functionality, or just do stuff pre-query. Use cases might include:

  • rewriting SPARQL for performance improvement, or access control
  • pre-fetching data from FROM graphs over the web and adding it to the store (you can set a ‘fetch_graphs’=> true parameter in the config array you set up the talis store with, and it will do this)
  • adding versioned changesets to the ARC store
  • inventing new keywords - eg: ABOUT <http://example.org/foo> could be rewritten to DESCRIBE ?s WHERE {{ ?s rdf:subject <http://example.org/foo> } UNION {?s cs:subjectOfChange <http://example.org/foo> } } - Similarly you could add syntactic support for rollbacks, transactions, updates

You can see more usage examples at: http://n2.talis.com/svn/playground/kwijibo/PHP/arc/plugins/trunk/talis/Talis_StorePlugin.demo.php

Platform release 9 is Live

We have successfully released Platform Version 9 into the live environment this evening. The release went smoothly with no problems and was completed between 18:00GMT and 18:22GMT. The outage also included a complete restart of the metadata store to include some performance tuning modifications.

Release notes for version 9 can be found here

If you have any issues or problems please do not hesitate to contact us through one of the usual channels.

Platform Release 9

The next monthly release of the Platform is scheduled for Monday, 25th February 2008. We are planning to perform the release between 18:00GMT and 19:00GMT.

Release notes can be found on the wiki along with the full release schedule for future releases. This release sees the addition of customisable language analysis for indexed metadata

In addition to the regular release, we plan to take the opportunity to carry out some tuning on some of the underlying Platform services. Unfortunately, this will require some downtime, so all platform services will be offline for a short period of time between 18:00GMT and 19:00GMT.