Big Data

The intelligence community's big-data problem

Big Data

The intelligence community is perhaps the most innovative data collector on the planet, with each of its 17 agencies able to siphon off various pools of information from nearly any source.

Yet the IC collects voluminous amounts of mostly fragmented data, and therein lies a challenge every other body in government struggling to make use of big data can relate to.

“In our world, we’re very good at collecting data, we’re also pretty good at analyzing it – we have to quickly parse out what is valuable,” Roger Hockenberry, a former chief technology officer for the Central Intelligence Agency, said during a panel session March 11 at the Symantec Government Symposium in Washington.

“Our data is always fragmented, and we’re trying to make sense of fragmented data options, which is extremely difficult,” said Hockenberry, who is now a consultant. “How we analyze every piece of data, how we reprocess it to continue to make better sense of what is going on – that is the biggest we have, especially when we can’t get complete databases.”

Former National Security Agency contractor Edward Snowden’s public disclosures of classified information have highlighted how the NSA and other agencies collect various sorts of signals intelligence. A significant amount of this data doesn’t come packaged neatly for ingestion and analysis in any open-source or proprietary platform. Social media feeds and emails, for example, represent large but highly unstructured datasets. To “normalize” that kind of unstructured data in a way that it becomes useful continues to be a major challenge, Hockenberry said.

To conduct its large-scale analytics effectively, the CIA uses a mixture of open-source and commercial products built off a data-science oriented reference architecture that sprung up from one of its small labs in the past decade. The CIA started with OpenStack and added commercial products in various places to note differences and build an effective and scalable solution.

Hockenberry said platforms and tools differ in usefulness depending on the environment in which they’re operating, and that logic also carries over to the post-analytic visualizations a dataset produces.

“You have to decide the right mix,” said Hockenberry, adding that big data forces analysts or data scientists to be creative in how they ask questions.

The intelligence community is at the forefront of big data as a technology, but even at its most effective levels, analyzing piles of unstructured, fragmented data is challenging. Algorithms will improve and data holders will inevitably learn to ask better questions of data, yet as the deluge of unstructured information continues to pour forth, finding meaningful signal in the noise is likely to remain problematic for some time.

“It’d be nice if al-Qaeda would ship us all their records in a nice, standard format, but they don’t,” Hockenberry said.  

About the Author

Frank Konkel is a former staff writer for FCW.

Featured

  • Contracting
    8 prototypes of the border walls as tweeted by CBP San Diego

    DHS contractors face protests – on the streets

    Tech companies are facing protests internally from workers and externally from activists about doing for government amid controversial policies like "zero tolerance" for illegal immigration.

  • Workforce
    By Mark Van Scyoc Royalty-free stock photo ID: 285175268

    At OPM, Weichert pushes direct hire, pay agent changes

    Margaret Weichert, now acting director of the Office of Personnel Management, is clearing agencies to make direct hires in IT, cyber and other tech fields and is changing pay for specialized occupations.

  • Cloud
    Shutterstock ID ID: 222190471 By wk1003mike

    IBM protests JEDI cloud deal

    As the deadline to submit bids on the Pentagon's $10 billion, 10-year warfighter cloud deal draws near, IBM announced a legal protest.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.