How to move from datasets to data services

Since my column called "A Gov 2.0 spin on archiving 2.0 data" was published in January, I have been asked how I re-architected what I call IT-centric systems to be knowledge-centric systems. Here's what I learned from my work at the Environmental Protection Agency and for interagency programs: Get the data out of legacy systems and use it in free cloud tools, such as Spotfire.

Federal CIO Vivek Kundra's requirements for agencies to put their high-value datasets into Data.gov and reduce the number of data centers can save money and improve results if there are more people like me who will take advantage of that by doing their own IT with cloud computing tools — and by becoming data scientists.

I explored those ideas in another column, "Empower feds to take on the cloud." I cited the IT Dashboard as an example of why government employees should become information architects and redesign existing systems. It took six months and cost the General Services Administration $8 million to develop the IT Dashboard. I re-architected and implemented it in about three days for free — except for the cost of my time at EPA — using Spotfire. Of course, that didn't include the cost of building the databases.

Now every time Data.gov Evangelist Jeanne Holm or the White House's former Deputy CTO Beth Noveck tweets about new data at Data.gov and on agency websites, I am eager to see what I can learn from the data. But before I can do that, I have to get to it and import it into a tool. Soon after I started doing that, I realized I should document what I learned so I could preserve it and share it with others. And I learned that a more formal name for all that — more than just metadata, data curation, etc. — was data science. Mike Loukides defined the term in an O’Reilly Radar report last year in which he wrote, "Data science enables the creation of data products."

I have embraced Tim Berners-Lee's five stars of open linked data and Richard Cyganiak's linked open-data cloud — with the condition that they follow the data science approach and provide an easy way to do SPARQL queries to get from Resource Description Framework (RDF) to comma-separated values (CSV). I agree with Peter Gassner's recent assessment of the five-star system in "Introduction to Linked Open Data for Visualization Creators," published on Datavisualization.ch. He said data quality considerations are essential to acceptance.

I have about 30 examples of data services that I have created following data science principles. In addition, I have about 30 examples of data services based on EPA databases that I have organized in information products for various agencies. In a data service, with click 1, you see the data (table, statistics, graphics); with click 2, you search the data (browse, sort, filter); and with click 3, you download the data (CSV, Excel, RDF).

I challenge the developers of Data.gov, agencies and others to deliver their information as data services so it is available with three clicks. Then we as a community can begin to do the serious data integration that Kundra has challenged us to do and produce the benefits from all this activity. As Kundra said, “True value lies at the intersection of multiple datasets.”

About the Author

Brand Niemann is senior data scientist at Semanticommunity.net and former senior enterprise architect and data scientist at the Environmental Protection Agency.

Featured

  • Cybersecurity

    DHS floats 'collective defense' model for cybersecurity

    Homeland Security Secretary Kirstjen Nielsen wants her department to have a more direct role in defending the private sector and critical infrastructure entities from cyberthreats.

  • Defense
    Defense Secretary James Mattis testifies at an April 12 hearing of the House Armed Services Committee.

    Mattis: Cloud deal not tailored for Amazon

    On Capitol Hill, Defense Secretary Jim Mattis sought to quell "rumors" that the Pentagon's planned single-award cloud acquisition was designed with Amazon Web Services in mind.

  • Census
    shutterstock image

    2020 Census to include citizenship question

    The Department of Commerce is breaking with recent practice and restoring a question about respondent citizenship last used in 1950, despite being urged not to by former Census directors and outside experts.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.