How to move from datasets to data services

Since my column called "A Gov 2.0 spin on archiving 2.0 data" was published in January, I have been asked how I re-architected what I call IT-centric systems to be knowledge-centric systems. Here's what I learned from my work at the Environmental Protection Agency and for interagency programs: Get the data out of legacy systems and use it in free cloud tools, such as Spotfire.

Federal CIO Vivek Kundra's requirements for agencies to put their high-value datasets into and reduce the number of data centers can save money and improve results if there are more people like me who will take advantage of that by doing their own IT with cloud computing tools — and by becoming data scientists.

I explored those ideas in another column, "Empower feds to take on the cloud." I cited the IT Dashboard as an example of why government employees should become information architects and redesign existing systems. It took six months and cost the General Services Administration $8 million to develop the IT Dashboard. I re-architected and implemented it in about three days for free — except for the cost of my time at EPA — using Spotfire. Of course, that didn't include the cost of building the databases.

Now every time Evangelist Jeanne Holm or the White House's former Deputy CTO Beth Noveck tweets about new data at and on agency websites, I am eager to see what I can learn from the data. But before I can do that, I have to get to it and import it into a tool. Soon after I started doing that, I realized I should document what I learned so I could preserve it and share it with others. And I learned that a more formal name for all that — more than just metadata, data curation, etc. — was data science. Mike Loukides defined the term in an O’Reilly Radar report last year in which he wrote, "Data science enables the creation of data products."

I have embraced Tim Berners-Lee's five stars of open linked data and Richard Cyganiak's linked open-data cloud — with the condition that they follow the data science approach and provide an easy way to do SPARQL queries to get from Resource Description Framework (RDF) to comma-separated values (CSV). I agree with Peter Gassner's recent assessment of the five-star system in "Introduction to Linked Open Data for Visualization Creators," published on He said data quality considerations are essential to acceptance.

I have about 30 examples of data services that I have created following data science principles. In addition, I have about 30 examples of data services based on EPA databases that I have organized in information products for various agencies. In a data service, with click 1, you see the data (table, statistics, graphics); with click 2, you search the data (browse, sort, filter); and with click 3, you download the data (CSV, Excel, RDF).

I challenge the developers of, agencies and others to deliver their information as data services so it is available with three clicks. Then we as a community can begin to do the serious data integration that Kundra has challenged us to do and produce the benefits from all this activity. As Kundra said, “True value lies at the intersection of multiple datasets.”

About the Author

Brand Niemann is senior data scientist at and former senior enterprise architect and data scientist at the Environmental Protection Agency.


  • IT Modernization
    shutterstock image By enzozo; photo ID: 319763930

    OMB provides key guidance for TMF proposals amid surge in submissions

    Deputy Federal CIO Maria Roat details what makes for a winning Technology Modernization Fund proposal as agencies continue to submit major IT projects for potential funding.

  • gears and money (zaozaa19/

    Worries from a Democrat about the Biden administration and federal procurement

    Steve Kelman is concerned that the push for more spending with small disadvantaged businesses will detract from the goal of getting the best deal for agencies and taxpayers.

Stay Connected