Big Data

Making big data work

abstract head representing big data

Behind the "big data" cliché is an explosion in the volume of information collected by sensors, cameras, social media, e-commerce, science experiments, weather satellites, logistics and a host of other sources. But to extract valuable insights from the terabytes and petabytes of information, analysts have to know how to use datasets in their systems, and compare data from different sources.

A standards-based approach is one way to facilitate this process, and the National Institute of Standards and Technology is leading an effort to bring some consensus in terms of the logistics, structure, and security of data, to the user community. A draft of the NIST Big Data Interoperability Framework, released April 6, looks to establish common a set of definitions for data science, and common ground, or "reference architecture," for what constitutes usability, portability, analytics, governance and other concepts.

"One of NIST's big data goals was to develop a reference architecture that is vendor-neutral and technology- and infrastructure-agnostic, to enable data scientists to perform analytics processing for their given data sources without worrying about the underlying computing environment," said NIST's Digital Data Advisor Wo Chang.

The framework is less a policy document than an agreed-upon set of questions that need to be answered, and challenges that need to be addressed in order to produce a consensus-based set of global standards for the production, storage, analysis and safeguarding of large, diverse datasets. NIST isn't looking to write specs for operational systems, or rules for information exchange or security. NIST's Big Data Public Working Group, which includes scientists in government, academia and the private sector, has released a seven-volume document designed to "clarify the underlying concepts of big data and data science to enhance communication among big data producers and consumers," per the report.

A set of use cases collected from contributors gets at the challenges facing government, researchers and industry in maintaining the viability and usability of current data, while preparing for the future.

For example, the National Archives and Records Administration faces the problem of processing and managing a huge amount of varied data, structured and unstructured, from different government agencies, that may have to be gathered from different clouds, and tagged to respond to queries, while preserving security and privacy where required by law.

The Census Bureau is exploring the possibility of using non-traditional sources from e-commerce transactions, wireless communications and public-facing social media data to augment or mash up with its survey data to improve statistical estimates, and produce data that is closer to real-time. But that data has to be reliable and maintain confidentiality.

On the security side, the NIST report calls attention to the future – the problem of protecting data that might need to outlast the lifespan and usefulness of the systems that house it, and the security measures that protect it.

Some types of data, including medical imaging data, security video and geospatial imaging were until relatively recently considered too large to be conveniently analyzed and shared over computer networks, and therefore weren't created with security and privacy in mind – that could be a problem down the road. The Internet of Things and the new troves of sensor data created by connected devices could create vulnerabilities for devices and data that were not previously considered.

NIST is accepting comments on the framework through May 21.

About the Author

Adam Mazmanian is executive editor of FCW.

Before joining the editing team, Mazmanian was an FCW staff writer covering Congress, government-wide technology policy and the Department of Veterans Affairs. Prior to joining FCW, Mazmanian was technology correspondent for National Journal and served in a variety of editorial roles at B2B news service SmartBrief. Mazmanian has contributed reviews and articles to the Washington Post, the Washington City Paper, Newsday, New York Press, Architect Magazine and other publications.

Click here for previous articles by Mazmanian. Connect with him on Twitter at @thisismaz.


  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

  • Comment
    Pilot Class. The author and Barbie Flowers are first row third and second from right, respectively.

    How VA is disrupting tech delivery

    A former Digital Service specialist at the Department of Veterans Affairs explains efforts to transition government from a legacy "project" approach to a more user-centered "product" method.

Stay Connected


Sign up for our newsletter.

I agree to this site's Privacy Policy.