Getting government datasets to talk the same language

Brand Niemann is senior data scientist at and former senior enterprise architect and data scientist at the Environmental Protection Agency.

Since my previous column "How to move from datasets to data services," I have been reflecting on the proliferation of data catalogs, the two most prominent of which are and, and their lack of standardization in vocabulary, format and functionality. An excellent inventory of open-government initiatives and their data catalogs at shows that diversity. initially opened community forums for discussion of open data and the Semantic Web, which have since expanded to include groups on health, law and restoring the Gulf. The Semantic Web forum has the following discussion areas:

  • Cross-domain linking, for sharing views on how best to interlink Linked Open Government Data.
  • Cross-domain vocabularies, for sharing views about vocabularies that all publishers of Linked Open Government Data are using or should be using.
  • Domain-specific vocabularies, for sharing views about vocabularies that are specific to agency mission areas.
  • Uniform Resource Identifier (URI) schemes, for sharing views about conventions for publishing and consuming Linked Open Government Data.

The latter is of particular interest because it is a critical part of Tim Berners-Lee’s five stars of linked open data.

  • Star 1: Make your stuff available on the Web, in whatever format.
  • Star 2: Make it available as structured data (e.g., an Excel spreadsheet instead of an image scan of a table).
  • Star 3: Use a nonproprietary format (e.g., comma-separated values instead of Excel).
  • Star 4: Use URLs to identify things, so people can link to your stuff.
  • Star 5: Link your data to other people’s data to provide context.
Stars 4 and 5 are more challenging to implement because multiple links are possible even with small datasets.

The seminal government work on that topic appears to be the United Kingdom Cabinet Office's “Designing URI Sets for the UK Public Sector.” It defines the design considerations and guidance by which public-sector Uniform Resource Identifier [URI] sets should be developed and maintained. They are designed to encourage those who own reference data to make it available for reuse and give those who have data that could be linked the confidence to reuse a URI/URL set that is not under their direct control.

In addition, I recommend following Principle 14 of the Open Group’s Data Principles: Common Vocabulary and Data Definitions. It states that data should be defined consistently throughout the enterprise, and the definitions should be understandable and available to all users.

Recently, the Open Government Working Group concluded that all vocabularies, terms and concepts should have a URI/URL, and each URI/URL should point to both data and a Web/wiki page that shows the metadata about that vocabulary, term or concept. Doing that one simple thing makes vocabularies universally accessible on the Semantic Web. Furthermore, the group recommends use of the U.K. document as a basis for a U.S. policy on minting URIs/URLs and that all open-government vocabularies use permanent URIs/URLs.

That recommendation is what I have been doing using a wiki, spreadsheets and a visualization/analytics tool following a simple metadata format (e.g., Facebook's Open Graph Protocol) so that each data table contains links to vocabulary definitions and metadata at well-defined URLs on the wiki, the wiki in turn provides well-defined links to the spreadsheets, and the tables in the visualization/analytics tool contain those links as well.

I challenge data and data catalog providers to deliver their data tables and catalogs with well-defined URIs/URLs to facilitate the integration of all that data.

About the Author

Brand Niemann is senior data scientist at and former senior enterprise architect and data scientist at the Environmental Protection Agency.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.


  • Anne Rung -- Commerce Department Photo

    Exit interview with Anne Rung

    The government's departing top acquisition official said she leaves behind a solid foundation on which to build more effective and efficient federal IT.

  • Charles Phalen

    Administration appoints first head of NBIB

    The National Background Investigations Bureau announced the appointment of its first director as the agency prepares to take over processing government background checks.

  • Sen. James Lankford (R-Okla.)

    Senator: Rigid hiring process pushes millennials from federal work

    Sen. James Lankford (R-Okla.) said agencies are missing out on younger workers because of the government's rigidity, particularly its protracted hiring process.

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group