How to move from datasets to data services

Since my column called "A Gov 2.0 spin on archiving 2.0 data" was published in January, I have been asked how I re-architected what I call IT-centric systems to be knowledge-centric systems. Here's what I learned from my work at the Environmental Protection Agency and for interagency programs: Get the data out of legacy systems and use it in free cloud tools, such as Spotfire.

Federal CIO Vivek Kundra's requirements for agencies to put their high-value datasets into Data.gov and reduce the number of data centers can save money and improve results if there are more people like me who will take advantage of that by doing their own IT with cloud computing tools — and by becoming data scientists.

I explored those ideas in another column, "Empower feds to take on the cloud." I cited the IT Dashboard as an example of why government employees should become information architects and redesign existing systems. It took six months and cost the General Services Administration $8 million to develop the IT Dashboard. I re-architected and implemented it in about three days for free — except for the cost of my time at EPA — using Spotfire. Of course, that didn't include the cost of building the databases.

Now every time Data.gov Evangelist Jeanne Holm or the White House's former Deputy CTO Beth Noveck tweets about new data at Data.gov and on agency websites, I am eager to see what I can learn from the data. But before I can do that, I have to get to it and import it into a tool. Soon after I started doing that, I realized I should document what I learned so I could preserve it and share it with others. And I learned that a more formal name for all that — more than just metadata, data curation, etc. — was data science. Mike Loukides defined the term in an O’Reilly Radar report last year in which he wrote, "Data science enables the creation of data products."

I have embraced Tim Berners-Lee's five stars of open linked data and Richard Cyganiak's linked open-data cloud — with the condition that they follow the data science approach and provide an easy way to do SPARQL queries to get from Resource Description Framework (RDF) to comma-separated values (CSV). I agree with Peter Gassner's recent assessment of the five-star system in "Introduction to Linked Open Data for Visualization Creators," published on Datavisualization.ch. He said data quality considerations are essential to acceptance.

I have about 30 examples of data services that I have created following data science principles. In addition, I have about 30 examples of data services based on EPA databases that I have organized in information products for various agencies. In a data service, with click 1, you see the data (table, statistics, graphics); with click 2, you search the data (browse, sort, filter); and with click 3, you download the data (CSV, Excel, RDF).

I challenge the developers of Data.gov, agencies and others to deliver their information as data services so it is available with three clicks. Then we as a community can begin to do the serious data integration that Kundra has challenged us to do and produce the benefits from all this activity. As Kundra said, “True value lies at the intersection of multiple datasets.”

About the Author

Brand Niemann is senior data scientist at Semanticommunity.net and former senior enterprise architect and data scientist at the Environmental Protection Agency.

The Fed 100

Save the date for 28th annual Federal 100 Awards Gala.

Featured

  • Social network, census

    5 predictions for federal IT in 2017

    As the Trump team takes control, here's what the tech community can expect.

  • Rep. Gerald Connolly

    Connolly warns on workforce changes

    The ranking member of the House Oversight Committee's Government Operations panel warns that Congress will look to legislate changes to the federal workforce.

  • President Donald J. Trump delivers his inaugural address

    How will Trump lead on tech?

    The businessman turned reality star turned U.S. president clearly has mastered Twitter, but what will his administration mean for broader technology issues?

  • Login.gov moving ahead

    The bid to establish a single login for accessing government services is moving again on the last full day of the Obama presidency.

  • Shutterstock image (by Jirsak): customer care, relationship management, and leadership concept.

    Obama wraps up security clearance reforms

    In a last-minute executive order, President Obama institutes structural reforms to the security clearance process designed to create a more unified system across government agencies.

  • Shutterstock image: breached lock.

    What cyber can learn from counterterrorism

    The U.S. has to look at its experience in developing post-9/11 counterterrorism policies to inform efforts to formalize cybersecurity policies, says a senior official.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group