Subscribe

Author Archive

Surge 2010

Having now finally gotten over my jetlag, I’ve had a few minutes to write up my notes from Surge 2010, which was a really great couple of days, perfectly filling its niche. It also had probably the best lineup of speakers at any conference I’ve attended. Aside from the content, the whole thing was brilliantly organised and run by OmniTI, who deserve a massive amount of credit for initiating such an awesome event. Mostly for my own benefit, I’ve collected a few writeups from other folk who attended, and videos & slides from pretty much all of the sessions are due to be published any day now.

The main message coming through was read more, learn more, share more. This theme ran through a number of talks, from John Allspaw & Brian Cantrill‘s opening keynotes to Theo’s closing plenary where he delivered the 11 Commandments of Scaling. There’s a huge body of literature out there constantly being produced by the academic and research communitities. In general, we in industry are not particularly good at putting it to use and building on top of it – all too often we’re found re-inventing the wheel, making the same mistakes over and over, and then perpetuating this vicious circle by not sharing our experiences with our peers.

Standout sessions for me included Allspaw’s keynote, delivered with customary insight and aplomb, where he talked of the absolute immaturity of Web Operations as a discipline, and of the huge amount that we can learn from more established like civil & mechanical engineering, the aerospace and utilities industries which have been tackling similar-shaped problems for decades, if not centuries.

Another highlight for me was Basho CTO Justin Sheehy‘s session on concurrency in distributed systems. Here, we got right to the nub of the issue – in any complex system, both in the real universe and in computer systems, its usually not correct to think of time as a single linear flow of events occurring in lockstep. Any software system, particularly any distributed system, that attempts to hide the underlying asynchronicity that this entails is fundamentally flawed. There are no strong guarantees of consistency in the physical world and certain domains, like banking for example, have long recognised this and built compensating mechanisms into their systems. A great soundbite is that we shouldn’t aim to build reliable systems (i.e. one that do not fail), but that we should aim to make our systems resilient to the failures that they will inevitibly encounter.

There were also some great case studies and war stories including Artur Bergman‘s deep dive into operations at Wikia, Ruslan Belkin‘s ‘Going 0 to 60: Scaling LinkedIn’ and Geir Magnusson’s detailed walk through of how gilt.com scaled up from a typical n-tier application by building out a loosely coupled, service oriented back end.

I definitely learned a lot, had a bunch of things reaffirmed, and also found a lot of great validation for the stuff we’re doing on our Platform. Can’t wait for next year.

Heading out to Surge 2010

A couple of us will be flying out to Baltimore tomorrow for the inaugural Surge Conference. Billed as “more than an event, it’s a chance to identify emerging trends and meet the architects behind established technologies”, the speaker list includes some real heavyweights and its hard (really hard) to pick which sessions to miss.

If you’re going to be there and fancy meeting up, feel free to ping either of us @beobal & @daveiw

Configuring Guice Dependencies Post-Deployment

In a number of our projects, the Platform engineering team use Guice as a dependency injection framework. The benefits of DI with regard to increasing modularity, lowering coupling and facilitating reuse are well documented, and a killer feature for us is the vast improvement of testability. One of the reasons we like Guice, is that all of your dependency wiring is done in code and so is checked by the compiler. Guice also seems to strike just the right balance between features and bloat, the core library makes it easy to do the things you really need, without including lots of stuff you don’t want. There’s also an active community developing extensions and additions to integrate or adapt Guice for specific uses.

Sometimes, we do want the ability to control the composition of an app at deploy time, which for us means specifying which combination of Guice Modules to configure our Injector with. Ordinarily, the main method (or something called early on in the application lifecycle) would contain some code to initialise the Injector with a list of Modules. Like so:

Injector injector = Guice.createInjector(new NetworkModule(),
                                         new SequencingModule(),
                                         new MySQLModule()
                                         new JMSModule());
SomeThing thing = injector.getInstance(SomeThing.class);

Our use case was this, we wanted to deploy the same distribution of an application to multiple places and configure which implementations of various internal services were used on each environment. So in the example above, we wanted to be able to choose between the bindings specified in MySQLModule and PostgresModule after deployment. Initially, it didn’t seem that there was an existing solution, until we ran into java.util.ServiceLoader. This enables multiple concrete implementations of abstract services (i.e. interfaces/abstract classes) to be specified at runtime using a simple descriptor file on the classpath (the javadocs have a much fuller explanation). So, in this case the abstract service that we want to load is defined by com.google.inject.Module and the concrete implementations are the specific combination of modules we want to use to configure our app. The hardcoded Injector bootstrapping is replaced with this one liner:

Injector injector = Guice.createInjector(ServiceLoader.load(Module.class));

The spec of which modules to load is contained in a classpath resource named META-INF/services/com.google.inject.Module and is just a simple list of full qualified class names

com.talis.network.NetworkModule
com.talis.sequence.SequencingModule
com.talis.db.mysql.MySQLModule
com.talis.jms.JMSModule

It’s possible to provide the service configuration file over HTTP by specifying remote URLs on the classpath, but at the moment we’re controlling which config gets deployed where using our regular deployment tool, Puppet.

Talis Hackday 1.0

A couple of weeks ago we held our first hackday. Basically, this involved taking over one of the larger rooms at Talis HQ for the day, filling it with hackers and pizza then baking for several hours. Hackdays tend to be aimed squarely at developers, but taking a leaf from events like Hacks & Hackers we wanted to be a bit more inclusive, so we tried to make it interesting and accessible to non-techies. For a week or so before the day, everyone who had an idea, pet project, or itch to scratch was encouraged to post it up on a whiteboard and ‘pitch’ it to other people, who might be interested in finding out more or even pitching in to help out. There were only 2 rules – that no idea was dismissed out of hand and that no-one was allowed to hack on stuff from their day job (because that’s what we do, like, every other day).
Talis is an organisation full of hackers, so there was no shortage of ideas or participants. In fact, the number of hacks posted on the board far exceeded our hacking capacity for a single day.

The day was a great success and we’re already planning future events with lots of ideas on how to tweak the format. We’d love to open these up for wider participation, and hope to be doing this in the next few months, so watch this space. There were some really cool projects being worked on, so see if anything tickles your fancy and let us know what you think.

Recording Environmental Data as RDF
Über-cool mashup of Arduino and RDF, Rob built a device to take temperature readings at regular intervals, represent the data in RDF and post it to a Platform Store. Its now sat on his windowsill, keeping us informed of the ambient temperature in Rob’s general vicinity

TweeVR
A twitter-enabled plugin for PVRs (primarily MythTV, but hopefully with support for other distros in the pipeline). Triggered when you record a TV show, this queries various datasources, integrates the data and publishes it for the world to see. Perfect for advertising your love of Carry On films or afternoon soap opera.

Store Activity Visualisation
Julian built a cool visualisation of activity on a Platform Store using the built-in OAI-PMH service which graphs updates made to both the Metabox and Contentbox over a specified period. The IRC logs for #talis are persisted in a Store, so we’re going to use this tool to graph activity on the channel.

Using PIG and Amazon Elastic MapReduce to Analyze Webserver Logs
We have a lot of logs, and as you can imagine, they contain lots of truly invaluable data. Some members of our Platform engineering team wanted to explore this a bit more deeply, and so spent the day hacking up Pig Latin scripts to do this. Since they managed to chomp so many logfiles, we let them get away with breaking hackday rule #2.

Android Life Tracker
Talis CSO Justin hacked up an android app to record events as RDF direct from mobile devices. Surprisingly, he’s chosen to store the trail of these events in a Platform Store for post-hoc analysis & data mining :)

Sparql 1.1 HTTP Update Protocol Implementation
Paolo spent the day working on a reference implementation of the current draft of the Sparql Working Group’s RESTful update protocol using Jena and Jersey/JAX-RS. Paolo plans to open source and contribute this back to the Jena project once he’s done.

Data Integration for Business Intelligence
John spent the day working on modelling data extracted from library loans services. Using RDF to integrate data from disparate sources like this is just the sort of job we built our Platform for.

Sackboy
Like a school sports day, there were no prizes awarded on the day. But if there had been, the gold medal would have undoubtedly gone to Ian Corns for his LittleBigPlanet hack – Sackboy Explains the Semantic Web.

There’s just no way you can top an platform-based romp through the bowels of CERN where the eponymous hero meets TimBL to explore the origins of the document and data web.