Nodalities

From Semantic Web to Web of Data
Nodalities

Updates

Follow us on:

Categories

Archives

License

Creative Commons License

Linked Data and Libraries 2011 – July 14th

bl1 After the great success of Linked Data and Libraries 2010 we are doing it again!

Linked Data and Libraries 2011 will be held at The British Library in London on Thursday July 14th.  Again it will be a free event, with limited spaces allocated, so register early.

The agenda is yet to be finalised, but as per 2010 it will be a mixture of general Linked Data overviews & experience, and library Linked Data speakers.  We hope to hear from the British Library, W3C Library Linked Data Incubator Group, LOD-LAM Summit, and others. We are also hoping to find time for the 10 minute lightening talks slot, that worked so well last time.

Register early and/or if you would like to propose a topic or speaker, email me – richard.wallis@talis.com.

Image from a photo on Flickr by Fuzzyyol

Talis Group completes the sale of its Library division to Capita Group plc

3 March 2011, Birmingham, UK

Talis Information Limited, the library division of Talis Group Ltd, has been acquired by the UK’s leading outsourcing firm, Capita Group plc. The transaction is valued at £18.5m with an additional £2.5m due, based on performance over the next 12 months. Talis Information Ltd has a range of around 100 academic and public library clients based in the UK and employs 42 staff, all of whom are based in Birmingham, UK.

Talis Group’s other portfolio companies including Talis Education Ltd, Talis Systems Ltd and Talis Inc are unaffected by the acquisition of Talis Information Ltd.  Talis Group’s other divisions provide a SaaS-based semantic web platform and related applications including Talis Aspire, a resource list management solution for higher education customers.

Thanksgiving for Open Government

On the eve of the American Thanksgiving holiday, millions of people travel to spend time with friends and family.  Before I share a meal with relatives, I contemplate the connection between the first thanksgiving and the emerging Open Government movement.

The “First Thanksgiving” celebration in the US was a feast shared by 53 starving pilgrims who survived a brutal winter in New England, and 90 Native Americans.  The Native Americans knew how to manage their land and waters to provide sufficient fish, meat, vegetables and fruit.

The connection between the first American Thanksgiving and Open Government has to do with adapting to a new world by sharing information.  Four hundred years ago,  the Native Americans shared information on seeds, crops and planting conditions, helping the pilgrims survive.  Today, sharing information via the Web is helping us to better understand climate conditions, our health care options and issues impacting our local community.

Last week I joined about 250 people at the first International Open Government Conference, hosted by the US Department of Commerce in Washington DC.  Approximately half the conference delegates were from government, the balance from academia and the private sector.  The speakers discussed Open Government projects underway in the US, UK, Australia, New Zealand and Brazil. Speakers shared success stories and areas for future development.  The common theme: democratizing public sector data and driving innovation.  Jonas Rabinovitch from the United Nations Department of Economic and Social Affairs highlighted several eGov strategies in developing nations.  Mr. Rabinovitch noted that all but three UN member nations have a basic Web presence, many offer online forms and some provide the ability to perform transactions via the Web.

Given the conference was hosted in the US Department of Commerce, data.gov featured prominently.  “The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”  Seven countries have stood up Open Government sites in the last 18 months, including UK, US, Australia, New Zealand, Canada and Finland.  Government administrators are seeking to restore public trust and establish an environment of transparency, participation and collaboration with the public.

The US Administration launched its Open Government Initiative in April 2009.  In the last two years, I’ve watched the US Executive Branch begin to move from  a “need to know” to a “need to share” culture.  This cultural transition and thus this Open Government Conference, was truly historic.  The conference underscored to me that we all, regardless of our political views and affiliation, live in a highly  interconnected global economy, underpinned by the World Wide Web.

Respected advisors on Open Government initiatives including Professor Jim Hendler of Rensselaer Polytechnic Institute and Sir Tim Berners-Lee, Director of the World Wide Web Consortium, agreed that public participation and collaboration will be key to the success of Open Government initiatives.  I believe that more conferences like this one and the Open Government Data Camp 2010 held in London last week, drawing delegates from a variety of disciplines, from several countries, will do a great deal to reinvigorate civic engagement and economic growth from the ground up.

Government employees are responding to mandates to publish content to Open Government websites.  Data.gov was launched in April 2009 with 47 data sets.  Vivek Kundra, U.S Chief Information Officer stated that data.gov has in excess of 300,000 data sets as of November 2010.  A large portion of the data.gov data sets are geospatial information which is an opportunity for scientists and entrepreneurs to build tools for analysis and visualization of this valuable data.  The UK Government as published over 4,600 data sets, including many from Great Britain’s national mapping agency, Ordnance Survey, providing the most accurate and up-to-date geographic data for the UK.

“The stakes are high for our interlinked global economy.”  Dr. Robert Schaefer, Deputy Project Scientist from Johns Hopkins University Applied Physics Lab gave a compelling presentation on the need for mechanisms to make sense of published data as Linked Open Data. Publishing the content as in RDF is not sufficient, rather, providing context on what the data implies is necessary.  Better tools for analysts and scientists to extract meaning from Linked Open Data will allow critical information on climate change and space weather, for example, to be more readily understood by policy makers.  Professor Schaefer stated the implications for climate change are serious, wide ranging & urgent.  Current CO2 emissions are higher than the International Panel of Climate Change “worst case” scenario.  Billions of people may experience serious consequences from climate change.  Professor Schaefer reiterated the need to get started as soon as possible.  “When the water from the sea rises, millions of people will have to move.”  This international conference will hopefully stimulate cooperation between the public and private sectors.  It is a critical step in making data accessible and providing decision support tools for space weather and climate change.

Mr. Kundra acknowledged we have much more to do to improve the quality of published data sets.  He said, “when I’m able to perform analytics on the fly, grounded on quality data, we will have achieved success.”  Delegates were encouraged by Mr. Kundra and  other speakers to build out communities of interest, lead by individuals, rather than government agencies. The US Government is regularly launching challenges, see http://www.challenge.gov, with modest cash prizes targeting citizens to gain insights on how we, the people, not government, can solve problems ranging from education on childhood obesity to sustainable urban housing that respects the environment.

Beth Simone Noveck, United States Deputy Chief Technology Officer for Open Government, leads President Obama’s Open Government Initiative.  Based at the White House Office of Science and Technology Policy, she is an expert on technology and institutional innovation. Ms. Noveck stated that “the Open Government Initiative is not transparency for transparency’s sake.  It is through participation and collaboration with academia and the public sector that there is value.”  Creating partnerships to use Open Government Data for important and unforeseen uses is empowering individuals with the ability to make better decisions and affect our quality of life.

We are in the very early stages of making Open Government available as Linked Data. Today, we are in the very early phases, however,  there are many good reasons to support Open Government initiatives including accountability in spending, improved health care provision, and addressing climate change and space weather which affects the world’s population.   The international data exchange standards are in now in place.  While experts will continue to refine the technical underpinnings and best practices will evolve, the citizen lead movement, assisted by government, is truly underway.

Bright young geeks are increasingly involved in American civic life through non-profit organizations like Code for America.  Passionate entrepreneurs like Dan Melton show that being being super bright and engaged at a grassroots level in government is both hip and necessary.  Code for America recruited twenty “fellows” from 362 applicants to get involved in city projects in 2011.  One example discussed was the Boston Project whose idea is to bring info on students together & create interesting applications leveraging federal census content, student data, transit info, city and state data.

Each month new mobile applications and social networking solutions are made available.  These are not expensive, government top down initiatives, rather, they are coming from the ground up by military personnel, students, local government officials, publishers, scientists and citizens who value transparent government.  An interesting mobile app for Android, iPhone and the iPad was unveiled for the New York Senate.  It is a real-time constituent mobile dashboard to the legislative process allowing citizens to connect with Senators, find and comment on bills, review votes and transcripts.

Academics are doing innovative research.  Grad students and post-docs are rapidly prototyping what the new world of open data will look like.   An increasingly number of software companies, including my employer Talis, are producing light weight platforms and cloud computing solutions.  Thousands of smart people have been creating the foundation of the Linked Data “ecosystem” in the form of International Data Standards and best practices over the last fifteen years, largely through the important work of the World Wide Web Consortium (W3C).

The availability of improved development tools is seen as a requirement for widespread proliferation of Semantically enabled applications, however,  people are leveraging international standards such as RDF for Linked Data, content sharing models, well-documented licensing models, and existing best practices.  Fully 25% of the applications shipped on a new Apple iPhone use government produced content.

I believe there are significant opportunities for commercial software firms to produce services and products to visualize data sets, find related data sets and most importantly, provide mechanisms as easy to use as the early Web to publish machine and human readable data as Linked Data.  There is burgeoning information economy rapidly forming around provision of public and private data mixed together in novel ways.  I believe that in 2011, truly useful tools for Web developers to create compelling Linked Data applications will be available for use with Open Government data.

We should all acknowledge that data will never be 100% perfect.  Real data is dirty, face it.  Yes, concerns will linger about misinterpretation and inappropriate mashups until people gain experience in making informed decisions based the data presented.  Be patient and don’t expect it to be perfect on day one or even year one.  Allow best practices to emerge from the ground up, by communities of interest.  Issues of data quality, provenance, context and important elements such as units of measure will all be addressed as Linked Data becomes more mainstream.  Harvard Business School published a blue print for use of open government data.  The W3C provides lots of useful guidance on eGovernment and Linked Data activities.

Just as the early American pilgrims experienced miscalculations in weather and agriculture, they eventually they figured out how to plant seeds correctly and increase their potential for a bountiful harvest.  Through information sharing and discussion by informed citizens, the US evolved a free and democratic form of government that is admired by millions of people around the world.

I’m optimistic that the citizens of the world will leverage Open Government initiatives for positive outcomes.  The more our governments support openness and transparency through Open Government initiatives, the more we, the people, can solve issues that matter at the community-level or on a global level.  The stakes are high and we should be grateful and cooperate to harness the power of Open Government data and the Web.  We are defining our history, as well as our future, today.

Introducing David Wood

Talis’ new subsidiary, Talis Inc., was announced just one busy month ago.  As Talis’ new chief geek in North America it is high time for me to introduce myself to the Nodalities community.

I am a software engineer with a long history working on the Web, developing Semantic Web and Linked Data infrastructure and building Web standards.  At Talis, I will be helping to define our market entry into the North American market and contributing to the technical direction of the Talis offerings.

Talis is a wonderful company to work for.  The environment is extremely collegial and pleasant, although also very productive.  I look forward to contributing to both the company and its communities, and to bringing some of the Talis culture into North America.

Talis Inc’s CEO, Bernadette Hyland, and I have started several companies together including Semantic Web startup Tucana Inc (sold to Northrop Grumman Corporation in 2005) and, more recently, the SemWeb consultancy Zepheira.  Much of our previous work has been released as Open Source Software, including the Mulgara Semantic Store, a popular Semantic Web database, the Persistent URL software and, most recently, a project aimed at making Semantic Web and Linked Data applications easier to create, called Callimachus.

The success of the Web is based on standards of the World Wide Web Consortium (W3C), so I have tried to help them whenever possible.  I co-chaired the Semantic Web Best Practices and Deployment Working Group and, more recently, the RDF Next Steps workshop.  In 2011, I’ll be co-chairing a new working group aimed at updating the Resource Description Framework (RDF), the technical standard underlying the Semantic Web and Linked Data.

The growth of Linked Data has lead to some truly interesting applications.  I’ve been working with many others to collect some of those use cases into book form.  The goal is to help others replicate those early successes.  The first book, Linking Enterprise Data has just been published and is available freely on the Web. It may also be purchased in ebook and printed form.  A second book is to be entitled Linking Government Data.  We are currently seeking contributions, so please contact me if you have a good story to tell about the use of Linked Data in government settings.

I occasionally teach computer science and mathematics courses at the University of Mary Washington in Fredericksburg, Virginia.  Most recently I’ve taught Computer Networking and Introduction to Discrete Mathematics.  It looks like I’ll be teaching an upper division elective on Linked Data during the summer of 2011.  I thoroughly enjoy working with university students.  Theirs is a fascinating time of life, when they choose how they see themselves as individuals and what they will (at least initially) do for a living.  They also help me keep a fresh perspective on our rapidly changing world.  Seeing the Web through their eyes is really very different than seeing it through mine.

We, as a community, built the Web. We continue to build our community as we build the Web. I look forward to being on the journey with you.

Linked Data – Coming Together

hannibal To quote John ‘Hannibal’ Smith, from that wonderful bit of 1980s TV, “I love it when a plan comes together!”.   Of course aficionados of the A-Team will probably remember ‘the plan’ was often only apparent in retrospect, although it’s general intention was clear from the start.

The adoption of  Linked Data and the realisation of all that potential benefit, is looking a bit like an A-Team episode – the eventual outcome being clear from the start, but with many setbacks, skirmishes to fight, partners to woo, nerves to calm, and teams to lead on the way.

To break the metaphor at this point, I see Linked Data as more of a shared vision than a plan laid out before us.  Nevertheless, I think we are staring to see elements of it ‘starting to come together’.

One very obvious example, is what Ordnance Survey is doing by continuing to open up their location data.  Now that OS have defined a URI for every UK postcode unit [eg. ‘SO16 4GU’ = http://data.ordnancesurvey.co.uk/id/postcodeunit/SO164GU], why would anyone [re-]publishing data in the future not use these identifiers to reference their postcode information?  By that simple step they will be linked in with a wealth of ancillary information about the location – easting/northing, ward, district, county, country, etc.

Goodwin BIS Great I hear you say, but show me an example of what that could lead to!  Being lazy, I’ll let the inimitable John Goodwin of the OS do it for me.  In his recent appropriately named “So what can I do with the new Ordnance Survey Linked Data?” post, he shows how by merging data from a previous Talis project, produced for the Department of Innovation and Skills, he can deliver a very different way of accessing the same data. 

The BIS Research Funding Explorer project brought together data about UK Government research funding, from several research councils and the Intellectual Property Office, and brought them together in a Linked Data driven application to display UK centres of research excellence. 

John explains how by mixing Linked Data, published for that project, with OS Linked Data, he has been able to develop a different way of accessing the data.  In his, prototype, application you are presented with a map of the UK showing the regions as defined by the European Union.  By clicking on one of the EU regions you are presented with a list of the projects from within that area.  He has also added the ability to access by county or District/Unitary Authority. A simple, but effective, way of demonstrating that data, in Linked Data form, from one source can be easily combined with data from another source to deliver benefit.

Of course even with this example we are seeing the effect of joining just a couple of jigsaw pieces together.  With Linked Data, such as this from OS, being published at an ever increasing rate, it will not be long before a bigger picture starts to form as more and more data pieces are linked together.

I love it when you can see a plan coming together!

Talis Inc

Talis Logo

Having moved over to the UK from the States quite a few years ago now, one of the things I noticed about company names was that they tend to use “LTD,” and for reasons unknown, I somehow always thought Talis Inc sounded better than Talis LTD.

Well, I’m very happy to be able to say that Talis Group LTD, will now have a new subsidiary with the excellent name: Talis Inc. The Inc means, of course, that we’ll have a new member of the Talis Group bringing our Platform, managed services and expertise to the United States.

Based in Virginia, Talis Inc will be ably lead by Bernadette Hyland, the new CEO of Talis Inc. She will be joined and supported by David Wood as VP Engineering. Together, Bernadette and David bring to Talis a huge amount of Semantic Web experience and a remarkable reputation: both entrepreneurs were founders of Tucana—one of the first commercial triple store vendors—and were most recently at the Semantic Web consultancy Zepheira.

Alongside a new subsidiary comes Talis’ first US customer: the US Government Printing Office (GPO). Talis will be running the GPO’s PURL infrastructure, which provides provides persistent Web addresses for critical government documents and is primarily used by the more than 1,200 Federal Depository Libraries. The PURL server uses the PURLz open source software, the development of which was led by David while at Zepheira, and complements the data hosting and search capabilities of the Talis Platform with identifier management functionality.

So, please join me in welcoming a stellar entrepreneurial team, our first US customer, and the addition of an Inc to the Talis family!

Streams, Pools and Reservoirs

by Leigh Dodds
| this article features in Nodalities Magazine, issue 6

As we start to move past the current boot-strapping phase of the semantic web in which we are constructing the web of linked data, its useful to begin discussing what other feature and infrastructure we need in order to support sustainable usage of this huge and growing data set: what services can be offered over linked data? Do we need to consider how to provide quality of service, stability and longevity to the data, or does the sheer scale of the web make these moot points?

In order to answer this question it’s useful to compare the ongoing development of the linked data web with that of the web itself.

A Brief History Lesson

There have been several phases of activity in the development of the web. While in truth, these phases were of different duration, overlapped with one another, and have happened at different rates within different communities, essentially we have gone with the following basic steps.

Firstly we concentrated on just getting stuff on line. The early web was a new medium for document and data exchange and so was at its core a simple publishing device used as a collaborative space between small communities. But as the amount of content and the size and breadth of those communities grew, the emphasis shifted towards linking: tying content together to create, – initially hand-crafted – indexes of the web and knit the available content into a greater whole.

The second, manual linking phase was quickly supplemented by a third phase of automated linking between content: search engines. A search engine is simply a way to quickly create a link-base based on some search criteria. The crawling and indexing of the document web by web crawlers allows users to quickly construct links to content of potential interest.

If we look at the recent, rapid development of the linked data cloud, we can already see that the same pattern is being repeated.

The third phase of the web’s development has been triggered by the commoditisation of search and the need for search engines to differentiate themselves and offer additional value-added services. Search engine features are now tailored towards particular uses or types of content (Google Image Search; Google Scholar); offer value-added features that capitalise on the ability for search engines to analyse the structure and traffic flows across the web (PageRank and similar indexing improvements; Google Trends); expanding the audience for content (Google Translate); and enabling community-driven customisation of the search experience (Google Custom Search; Yahoo Search Monkey, etc).

No doubt there will be subsequent phases of development, and the perspective of history will let us tease out common strands of development some of which will already be happening. But if we look at the recent, rapid development of the linked data cloud, we can already see that the same pattern is being repeated.

History Recapitulated

There has been RDF data available on the web for many years, used by a limited community of researchers. This slow accumulation of content – echoing the first phase of content publishing on the document web – has been replaced by a rapid increase in data publishing encouraged through the Linking Open Data (LOD) project. By providing clear pragmatic guidance and instructions on how to publish data for the semantic web, that project has enabled us to accelerate our transition through that first content publishing phase. But it has also, crucially, encouraged the linking together of data sets (Phase 2).

This linking has to a great extent been manual. Not in the sense that members of the LOD community are manually entering data to link datasets together, but rather at the level of looking for opportunities to link together datasets, encouraging data publishers to co-ordinate and inter-relate their data, and by attempting to organically grow the link data web by targeting datasets that would usefully annotate or extend the current Linked Data Cloud.

The rapid growth of the Linked Data Cloud means that this “manual” phase will soon be over: there will be sufficient momentum behind the semantic web that increasing amounts of data will become available and no single community will be able (or need) to shepherd its development. The focus will shift towards the subject specific communities who will instead co-ordinate at a more local level. Semantic web search engines will also become a reality.

Semantic Web search engines need to be distinguished from semantically enabled search engines. The latter use techniques like natural language parsing and improved understanding of document semantics in order to provide an improved search experience for humans. A Semantic Web search engine should offer infrastructure for machines. This Third Phase is also beginning to take place. Simple semantic web search engines like Swoogle and Sindice provide a way to for machines to construct link bases, based on some simple expressions of what data is of relevance, in order to find data that is of interest to a particular user, community, or within the context of a particular application. And crucially this can be done without having to always crawl or navigate over the entire linked data web. This process can be commoditised just as it has with the web of documents.

Co-Evolution of the Web Infrastructure

Given the strong concordance between the phases of development of the document and linked data web, it is reasonable to make some predictions on how semantic web search engines, and additional supporting infrastructure, is likely to evolve by comparing them with the development of human search engines. For each of the specialisations and value-added features listed earlier its possible to see an equivalent for the machine-readable web:

Document Web Semantic Web Infrastructure Description
Google Image Search Type Searching Ability to discover resources of a particular type: e.g. Person, Review, Book
Google Translate Vocabulary Normalisation Application of simple inferencing to expose data in more vocabularies that made available by the publisher
Google Custom Search Community Constructed Data Sets and Indexes Ability to create and manipulate custom subsets of the linked data cloud
Google Trends Linked Data Analysis & Publishing Trends Identifying new data sources; new vocabularies; clusters of data; data analysis

These last two are particularly interesting as they suggest the need to be able to easily aggregate, combine and analyse aspects of the linked data cloud. This infrastructure will need to be able to support the community in working with data in a variety of ways, allowing data to flow and be collected where it is needed. Introducing a metaphor for this process might help highlight some of the processes and its consequences.

Flowing Data

If we start building large pools of data, within a community supported infrastructure, then we have a reservoir.

Data is like water and flows of data are like streams. These streams of data can arise from any number of different sources: from a person entering data into a system; from a click stream generated as a side-effect of web browsing; application events; or generated from real-world sensor measurements. There are already many ways that we can tap into these data streams, using web-based query APIs, messaging systems like XMPP, or syndication protocols like Atom and RSS.

While these streams of data are already supporting a huge range of different applications and use cases, they are inherently limited: a stream has no memory. If historical context is required, e.g. to support more complex querying and reporting, then each consuming application must collect and store the data. We can think of these collections of data as pools; each stream of data on the web may feed any number of different application-specific pools.

A pool of data provides extra flexibility, but comes at the cost of requiring each consuming application to maintain its own infrastructure to hold copies of that data. Even if each source of data provides direct access to its own pool, e.g. by exposing a web-based query interface onto its database, or by exposing linked data, there are still unnecessary overheads. Each data provider must provide their own scalable infrastructure and support a rich set of data access options.

If we start building large pools of data, within a community supported infrastructure, then we have a reservoir. A reservoir is a pool of data that is maintained by and services a specific community. Reservoirs allow issues such as quality of service (reliable supply of water) and infrastructure costs (building of pipelines) to be solved at a community level.

Its possible to argue that the web already consists of streams, pools, and reservoirs, but there is a distinct difference between a web based on semantic web technology and a Web constructed of a mixture of XML documents or similar formats: like water, at the molecular level, all RDF is the same; its all triples. Unlike alternatives, RDF data is more easily pooled and collected and so is much more amenable to explorations of shared infrastructure. Like a relational database, an RDF triple-store can contain an huge variety of different kinds of data. But unlike a relational database, an RDF triple-store, has the potential for the aggregate to be much more than the some of its parts. The seeds of convergence are built in, through reliance ah the most fundamental level on a global naming system (URIs) and standardised ways to state equivalence and relationships between resources.

In the real world, reservoirs do more than supply a community with water. The aggregate has its own uses: water skiing or hydro-electric power generation for example. And the same will be true of semantic web data reservoirs: large collections of data can be analysed and re-purposed in ways that are not possible – or at least not achievable without a great deal of repeated, redundant integration effort – using other techniques. The reservoir itself can be the source of new facts and new streams of data derived from analysis of its contents.

Flowing Data through the Talis Platform

The goal of the Talis Platform is to support the growth of the Linked Data ecosystem by providing the infrastructure to support the creation of pools of data. For additional background, see my article “Enabling the Linked Data Ecosystem” from Nodalities issue 5.

At present the Platform provides a range of services that allow data to be easily streamed into and out of Platform stores, allowing data to be easily pooled in order to benefit from greater context. Data can be pushed directly into the Platform and we are exploring methods of supporting other forms of data ingestion to make it easier and more natural to begin to accumulate data sets within the Platform.

The core search service, which produces its results in RSS, allows the creation of simple data streams, while the SPARQL interface supports more complex data extraction methods. The Augmentation service provides an interesting twist on these conventional approaches, providing a means for any RSS 1.0 feed to be automatically enriched with extra metadata by feeding it through a Platform data store. This means of interaction is like fishing for data: it is possible to serendipitously find and extract data, capturing it as extra context to items in an RSS feed, without having to deal with writing SPARQL queries or constructing a keyword search. There are many more methods and modes of data extraction that will be added to the Platform to add to these existing services; this is just the beginning.

But the Talis Platform is intended to provide much more than just the ability to work with pools of data. The bigger vision is to support the creation of true data reservoirs, and enable many different ways of manipulating and analysing their contents in order to discover new facts and bring new context to that data. Creation of these larger pools of content will need to be made sustainable for the communities that are creating them, and deriving value from them. Sustainability covers a wide range of issues that go beyond just commercial issues: quality and range of services are additional factors, as are forms of governance, trust and quality that relate to the data sets themselves. The Platform is intended to address all of these issues.

To take a small example, the experimental “store groups” feature that was released at the end of last year, provides a simple method for combining datasets, without requiring that data to be completely loaded or copied into a single database. The store groups feature will ultimately support a range of services over the constituent data sets, allowing each pool of data to remain intact whilst still contributing to the whole; this will be important to support the new forms of governance that are beginning to emerge around datasets on the Linked Data web.

Linked Data In(ter)action

By Benjamin Nowack

| This article will feature in Nodalities Magazine, Issue 6

During the recent months, the Semantic Web community is accelerating its progress around web-enhanced information and knowledge management. Specifications such as RDF and SPARQL are increasingly applied by developers and organizations, RDF software is maturing. Even the initial chicken and egg problem around data and applications has now been solved by the Linking Open Data (LOD) project, which is bringing dataset after dataset online, each following recommended practices for simplified information access and repurposing. The time has finally come to move on and create the distributed data applications we have been dreaming of for so long.

Just like the Web’s true innovation was not hypertext as such, but freeing it from isolated CD ROMs, the Semantic Web’s value proposition is not information integration per se, but doing it on a global scale. Network effects will play an important role and have to be considered by application developers. Mashups on a semantic web are not one-off combinations of existing sources and APIs. They will feed their added value back into a self-enforcing Linked Data Ecosystem, thus enabling chains of applications, with each reaping the benefits of the previous one. RDF developers these days often use terms like “Meshup” or “Hyperdata” to describe the direction they are headed.

Linked Data is all about portability and off-site use: The more a respective application attracts users, the more will it let them take their data with them and also integrate external sources. With a bit of luck, we will see not one, but a wealth of killer applications, where the “unique selling proposition” is personal and defined by each user individually.
Despite the ongoing advances, some pieces to the puzzle are still missing. This becomes clearer when we correlate the current state of the Linked Data market to a typical information life cycle classification. While we can name solutions for each value-increasing process (Creation, Organization, Utilization, Distribution, Discovery), the Utilization and Application stage represents a bottleneck. Products start to benefit from Linked Data, but few are also re-distributing their internally enriched information. Additionally, the Creation phase today is mostly driven by dedicated efforts such as the LOD project, although data manipulation and enhancing should also be possible right while people are interacting with semantic web content.

Linked Data Value Spiral

A few months ago, Talis researcher Tom Heath wrote an inspiring IEEE Internet Computing essay titled “How Will We Interact with the Web of Data?” where he described the upcoming challenges and opportunities in the context of human-computer interaction. He suggested that on a web where the granularity is increased from documents to arbitrary things, user interfaces should treat individual objects as first-class citizens, ideally providing context-specific functionality, direct manipulation, and coherence across personal usage scenarios. Application models that go beyond browsing and which are both universal and user-friendly are an ongoing challenge.

A system that aims at finding a sweet spot between simplicity and standardized interaction is Paggr (paggr.com). The basic idea is to combine successful Web 2.0 solutions and trends with Tim Berners-Lee’s concept of an “RDF Clipboard” for polymorphic data exchange between desktop applications. The required technical trick for copy-by-reference across desktop and web applications was introduced by Ray Ozzie three years ago through his “Live Clipboard”. Around the same time, AJAX and converging browser capabilities mass-enabled interactive HTML elements, and personal portal builders such as Netvibes brought widgets and drag and drop to end-users. The amount of open datasets and technical possibilities finally led to a first prototype for building Linked Data Dashboards a few months ago.

The system used Netvibes-like pages with three resizable colums that could be populated with so-called Sparqlets. A Sparqlet is a SPARQL-powered widget, defined by a set of queries and result templates. The output consists of machine-readable HTML which addresses three essential requirements:

  • Widgets can easily be copied to other dashboards, their complete definition is retrievable via HTTP (by de-referencing the widget identifier).
  • Individual items in a widget can be interactively linked to other items, as each element is associated with a URI. This makes semantic drag and drop possible, such as dragging a person representation on a map or an address book widget.
  • Being able to instantly feed augmented data back into the personal or public data cloud.

Architecture

The prototype received encouraging and very helpful feedback at the International Semantic Web Conference (and even won a prize). We are clearly not ready for the mainstream user yet, but building on established interaction models seems to be a promising acceptance strategy. The next iteration of Paggr is now almost finished and we are looking forward to putting it online. The first public applications will be limited to focused use cases (such as an organizer for conference attendees) as we are still working on certain interface behaviors, but a private alpha phase with less restrictions is planned, too.

Linked Data Dashboards face a number of usability challenges. The big question is how to tie the wealth of possibilities to a generic user interface without sacrificing work efficiency. Application convenience often boils down to feature reduction and contextual options, possibly combined with shortcuts for common tasks. To reduce complexity, Paggr lets the user (or app creator) break the theoretically infinite possibilities down into separate dashboards, where options and relations can be further spread across widgets.

The more complicated part starts at the widget level. Semantic drag and drop is often multi-modal. Dragging an event on a calendar does not necessarily mean “Add”, there are many ways to link two persons to each other, etc. Also, working with Linked Data is sometimes like having a backstage pass for a concert: very exciting, but also a bit rough, easily overwhelming, and if you open the wrong door, you can quickly find yourself getting kicked out. Raw data (or equally ugly RDF/HTML dumps) are always just a link away, application designers will try to carefully shield non-developers from being exposed to things like DBPedia pages. For developers, on the other hand, this equivalent to the early Web’s “view source” feature can be very valuable.

Now, what exactly are the requirements and nice-to-haves, and (how) can they be implemented through widgets without leading to cluttered screen estate? As mentioned above, in order to support drag and drop as well as copy and paste between different browser tabs or even at the operating system level, we can use a technical trick introduced by Live Clipboard: transparent form fields that natively provide “right-click / paste” and similar functionality. For a consistent user experience, this means that we need distinguishable (but unobtrusive) fields for each interactive element. In Paggr, small Semantic Web icons next to widget items and title bars signal the availability of advanced options. They enable:

  • widget filtering
  • copying widget or item identifiers
  • removing items from and adding items to widgets
  • interlinking individual items
  • custom contextual menus

Paggr Widget

The approach of using dedicated interaction zones has desirable side-effects. Non-expert users are less likely to get confused, as the general markup keeps its expected behavior. It also becomes possible to disable the semantic extensions simply by deactivating and hiding the icons. A public dashboard or shared meshup may look and feel just like a normal website.

There are still several unresolved issues left and future iterations could well require a complete re-design, but Paggr is just one of a growing number of consumer-oriented Linked Data systems. After years of hard infrastructure work, the Semantic Web community is finally starting to benefit from the investments. Data-wise, we have probably reached the tipping point already. Even former critics start to make their information available in RDF, efforts like microformats, once regarded as competitors, have become accessible from SPARQL, and services like OpenCalais, Yahoo!’s SearchMonkey, or the Zemanta API are constantly reinforcing the network effects of structured open data. It should only be a matter of months until we are going to see the first fully-fledged Linked Data applications for end-users.

Benjamin Nowack is the developer of Paggr. He runs semsol, a tiny Semantic Web agency in Düsseldorf, Germany.

Nigel Shadbolt talks about Web Science, the Semantic Web, Linked Data, and Garlik

In my latest podcast I talk with Nigel Shadbolt, Professor of Artificial Intelligence at the University of Southampton. We discuss Nigel’s background in Artificial Intelligence, and the appeal of the Semantic Web, before turning to explore the introduction of Linked Data to an enterprise audience and the multidisciplinary focus required to carry the Web forward.

During the conversation, we refer to the following resources;

This conversation was recorded on Tuesday 10 March, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.

Michael Crandell talks about RightScale

In our latest podcast I talk with Michael Crandell, CEO and Founder of RightScale. We discuss the Santa Barbara (California) based company, its place in the Cloud Computing stack above infrastructure providers such as Amazon, GoGrid and Rackspace, and consider the ways in which it is evolving as these companies upon which it is dependent develop ever-richer sets of features.

During the conversation, we refer to the following resources;

This conversation was recorded on Thursday 5 March, 2009.

For other Talis podcasts in this Nodalities series, see here. To subscribe to updates from all of Talis’ podcast series, see here.