Updates

Follow us on:

Talis Events

Linked Data Open Day USA
San Francisco - June 1st

Linked Data & Libraries
London - July 14th

Categories

US Environmental Protection Agency Prepares Facilities Data for the Linked Data Cloud

Publishing high quality Government data is part of the US Administration’s Open Government Initiative.  Linked Open Data reduces IT costs because it is easier to integrate data between agency programs and government organizations using International Data Standards for the Web.  As more and more Government Agencies publish high quality data sets, the objectives of the Open Government Initiative, to strengthen democracy, and to promote efficiency and effectiveness of government, are realized.

Talis recently assisted the Environmental Protection Agency’s Office of Environmental Information to model and publish an important data set called the Facility Registry System (FRS) as high quality Linked Open Data.  FRS is a source of comprehensive environmental information (e.g., air, water, and waste) for facilities, sites, and places.  This recent effort assisted the Agency to better understand the process of publishing high quality data sets that are increasingly being made available through data.gov.

The EPA brought together Linked Data experts from Talis, along with FRS subject matter experts from EPA and their contractors.   We held face to face sessions during February and March 2011 to walk through the data modeling process.  Data modeling involved a two day deep dive on the existing relational database model, followed by an interim model review.   The team discussed strategies for URI selection, vocabulary re-use, creation, and query support via SPARQL.  Talis delivered the modeling guide, along with the RDF for 2.6M Facilities of interest to the EPA, which was represented in approximately 103M triples.  Together with EPA, we identified next steps and reviewed future maintenance activities.  US EPA now has a ‘reference implementation’ for a high quality data set modeled and converted to RDF that can be used for future Linked Open Data initiatives.

One of the major insights gained upon completion of the project was how easy it was to put the data into any standards compliant RDF database.  There were no complex conversion scripts to load the data into different databases.  Using International Data Standards, it is trivial to load RDF into a store, query and visualize the data in literally one day. This same effort using a traditional 3 tier approach using relational technologies would have taken six months or longer.  This is a major benefit and cost savings to federal systems integrators and Government Agencies, resulting in significant cost savings and time efficiencies.

Another major insight for the participants was being able to view their data through a variety of freely available Linked Data visualization tools.   Several of the tools demonstrated were developed by or with support from Talis including LinkSailor, Talis Platform. Callimachus and Morph.  Two other useful visualization tools demonstrated were Spark and Exhibit, both are available as Open Source.

The first thing I looked at was, “what facilities of interest are in my backyard?”  We all care about our communities.   With more and more government agencies publishing data we can readily understand and visualize via the Web, the more we as citizens are able to make informed decisions.  The easier it is for government authorities and public industry to access and visualize high quality data published by authoritative sources such as the US EPA, the more efficiently they can manage their limited resources, and do so responsibly.  With projects such as the EPA’s Facilities Registry published as high quality Linked Open Data, everyone wins.  The EPA, other government agencies, citizens, journalists and bloggers will all be able view valuable data and create mash ups with well-modeled Linked Open Data sets.  We look forward to the EPA Facilities data set being made available via data.gov in the near future.

Comments are closed.