How to move from datasets to data services

Since my column called "A Gov 2.0 spin on archiving 2.0 data" was published in January, I have been asked how I re-architected what I call IT-centric systems to be knowledge-centric systems. Here's what I learned from my work at the Environmental Protection Agency and for interagency programs: Get the data out of legacy systems and use it in free cloud tools, such as Spotfire.

Federal CIO Vivek Kundra's requirements for agencies to put their high-value datasets into and reduce the number of data centers can save money and improve results if there are more people like me who will take advantage of that by doing their own IT with cloud computing tools — and by becoming data scientists.

I explored those ideas in another column, "Empower feds to take on the cloud." I cited the IT Dashboard as an example of why government employees should become information architects and redesign existing systems. It took six months and cost the General Services Administration $8 million to develop the IT Dashboard. I re-architected and implemented it in about three days for free — except for the cost of my time at EPA — using Spotfire. Of course, that didn't include the cost of building the databases.

Now every time Evangelist Jeanne Holm or the White House's former Deputy CTO Beth Noveck tweets about new data at and on agency websites, I am eager to see what I can learn from the data. But before I can do that, I have to get to it and import it into a tool. Soon after I started doing that, I realized I should document what I learned so I could preserve it and share it with others. And I learned that a more formal name for all that — more than just metadata, data curation, etc. — was data science. Mike Loukides defined the term in an O’Reilly Radar report last year in which he wrote, "Data science enables the creation of data products."

I have embraced Tim Berners-Lee's five stars of open linked data and Richard Cyganiak's linked open-data cloud — with the condition that they follow the data science approach and provide an easy way to do SPARQL queries to get from Resource Description Framework (RDF) to comma-separated values (CSV). I agree with Peter Gassner's recent assessment of the five-star system in "Introduction to Linked Open Data for Visualization Creators," published on He said data quality considerations are essential to acceptance.

I have about 30 examples of data services that I have created following data science principles. In addition, I have about 30 examples of data services based on EPA databases that I have organized in information products for various agencies. In a data service, with click 1, you see the data (table, statistics, graphics); with click 2, you search the data (browse, sort, filter); and with click 3, you download the data (CSV, Excel, RDF).

I challenge the developers of, agencies and others to deliver their information as data services so it is available with three clicks. Then we as a community can begin to do the serious data integration that Kundra has challenged us to do and produce the benefits from all this activity. As Kundra said, “True value lies at the intersection of multiple datasets.”

About the Author

Brand Niemann is senior data scientist at and former senior enterprise architect and data scientist at the Environmental Protection Agency.

The Fed 100

Read the profiles of all this year's winners.


  • Shutterstock image (by wk1003mike): cloud system fracture.

    Does the IRS have a cloud strategy?

    Congress and watchdog agencies have dinged the IRS for lacking an enterprise cloud strategy seven years after it became the official policy of the U.S. government.

  • Shutterstock image: illuminated connections between devices.

    Who won what in EIS

    The General Services Administration posted detailed data on how the $50 billion Enterprise Infrastructure Solutions contract might be divvied up.

  • Wikimedia Image: U.S. Cyber Command logo.

    Trump elevates CyberCom to combatant command status

    The White House announced a long-planned move to elevate Cyber Command to the status of a full combatant command.

  • Photo credit: John Roman Images /

    Verizon plans FirstNet rival

    Verizon says it will carve a dedicated network out of its extensive national 4G LTE network for first responders, in competition with FirstNet.

  • AI concept art

    Can AI tools replace feds?

    The Heritage Foundation is recommending that hundreds of thousands of federal jobs be replaced by automation as part of a larger government reorganization strategy.

  • DOD Common Access Cards

    DOD pushes toward CAC replacement

    Defense officials hope the Common Access Card's days are numbered as they continue to test new identity management solutions.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group