How to move from datasets to data services

Since my column called "A Gov 2.0 spin on archiving 2.0 data" was published in January, I have been asked how I re-architected what I call IT-centric systems to be knowledge-centric systems. Here's what I learned from my work at the Environmental Protection Agency and for interagency programs: Get the data out of legacy systems and use it in free cloud tools, such as Spotfire.

Federal CIO Vivek Kundra's requirements for agencies to put their high-value datasets into Data.gov and reduce the number of data centers can save money and improve results if there are more people like me who will take advantage of that by doing their own IT with cloud computing tools — and by becoming data scientists.

I explored those ideas in another column, "Empower feds to take on the cloud." I cited the IT Dashboard as an example of why government employees should become information architects and redesign existing systems. It took six months and cost the General Services Administration $8 million to develop the IT Dashboard. I re-architected and implemented it in about three days for free — except for the cost of my time at EPA — using Spotfire. Of course, that didn't include the cost of building the databases.

Now every time Data.gov Evangelist Jeanne Holm or the White House's former Deputy CTO Beth Noveck tweets about new data at Data.gov and on agency websites, I am eager to see what I can learn from the data. But before I can do that, I have to get to it and import it into a tool. Soon after I started doing that, I realized I should document what I learned so I could preserve it and share it with others. And I learned that a more formal name for all that — more than just metadata, data curation, etc. — was data science. Mike Loukides defined the term in an O’Reilly Radar report last year in which he wrote, "Data science enables the creation of data products."

I have embraced Tim Berners-Lee's five stars of open linked data and Richard Cyganiak's linked open-data cloud — with the condition that they follow the data science approach and provide an easy way to do SPARQL queries to get from Resource Description Framework (RDF) to comma-separated values (CSV). I agree with Peter Gassner's recent assessment of the five-star system in "Introduction to Linked Open Data for Visualization Creators," published on Datavisualization.ch. He said data quality considerations are essential to acceptance.

I have about 30 examples of data services that I have created following data science principles. In addition, I have about 30 examples of data services based on EPA databases that I have organized in information products for various agencies. In a data service, with click 1, you see the data (table, statistics, graphics); with click 2, you search the data (browse, sort, filter); and with click 3, you download the data (CSV, Excel, RDF).

I challenge the developers of Data.gov, agencies and others to deliver their information as data services so it is available with three clicks. Then we as a community can begin to do the serious data integration that Kundra has challenged us to do and produce the benefits from all this activity. As Kundra said, “True value lies at the intersection of multiple datasets.”

About the Author

Brand Niemann is senior data scientist at Semanticommunity.net and former senior enterprise architect and data scientist at the Environmental Protection Agency.

FCW in Print

In the latest issue: Looking back on three decades of big stories in federal IT.

Featured

  • FCW @ 30 GPS

    FCW @ 30

    Since 1987, FCW has covered it all -- the major contracts, the disruptive technologies, the picayune scandals and the many, many people who make federal IT function. Here's a look back at six of the most significant stories.

  • Shutterstock image.

    A 'minibus' appropriations package could be in the cards

    A short-term funding bill is expected by Sept. 30 to keep the federal government operating through early December, but after that the options get more complicated.

  • Defense Secretary Ash Carter speaks at the TechCrunch Disrupt conference in San Francisco

    DOD launches new tech hub in Austin

    The DOD is opening a new Defense Innovation Unit Experimental office in Austin, Texas, while Congress debates legislation that could defund DIUx.

  • Shutterstock image.

    Merged IT modernization bill punts on funding

    A House panel approved a new IT modernization bill that appears poised to pass, but key funding questions are left for appropriators.

  • General Frost

    Army wants cyber capability everywhere

    The Army's cyber director said cyber, electronic warfare and information operations must be integrated into warfighters' doctrine and training.

  • Rising Star 2013

    Meet the 2016 Rising Stars

    FCW honors 30 early-career leaders in federal IT.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group