Big Data

The intelligence community's big-data problem

Big Data

The intelligence community is perhaps the most innovative data collector on the planet, with each of its 17 agencies able to siphon off various pools of information from nearly any source.

Yet the IC collects voluminous amounts of mostly fragmented data, and therein lies a challenge every other body in government struggling to make use of big data can relate to.

“In our world, we’re very good at collecting data, we’re also pretty good at analyzing it – we have to quickly parse out what is valuable,” Roger Hockenberry, a former chief technology officer for the Central Intelligence Agency, said during a panel session March 11 at the Symantec Government Symposium in Washington.

“Our data is always fragmented, and we’re trying to make sense of fragmented data options, which is extremely difficult,” said Hockenberry, who is now a consultant. “How we analyze every piece of data, how we reprocess it to continue to make better sense of what is going on – that is the biggest we have, especially when we can’t get complete databases.”

Former National Security Agency contractor Edward Snowden’s public disclosures of classified information have highlighted how the NSA and other agencies collect various sorts of signals intelligence. A significant amount of this data doesn’t come packaged neatly for ingestion and analysis in any open-source or proprietary platform. Social media feeds and emails, for example, represent large but highly unstructured datasets. To “normalize” that kind of unstructured data in a way that it becomes useful continues to be a major challenge, Hockenberry said.

To conduct its large-scale analytics effectively, the CIA uses a mixture of open-source and commercial products built off a data-science oriented reference architecture that sprung up from one of its small labs in the past decade. The CIA started with OpenStack and added commercial products in various places to note differences and build an effective and scalable solution.

Hockenberry said platforms and tools differ in usefulness depending on the environment in which they’re operating, and that logic also carries over to the post-analytic visualizations a dataset produces.

“You have to decide the right mix,” said Hockenberry, adding that big data forces analysts or data scientists to be creative in how they ask questions.

The intelligence community is at the forefront of big data as a technology, but even at its most effective levels, analyzing piles of unstructured, fragmented data is challenging. Algorithms will improve and data holders will inevitably learn to ask better questions of data, yet as the deluge of unstructured information continues to pour forth, finding meaningful signal in the noise is likely to remain problematic for some time.

“It’d be nice if al-Qaeda would ship us all their records in a nice, standard format, but they don’t,” Hockenberry said.  

About the Author

Frank Konkel is a former staff writer for FCW.

The Fed 100

Save the date for 28th annual Federal 100 Awards Gala.

Featured

  • computer network

    How Einstein changes the way government does business

    The Department of Commerce is revising its confidentiality agreement for statistical data survey respondents to reflect the fact that the Department of Homeland Security could see some of that data if it is captured by the Einstein system.

  • Defense Secretary Jim Mattis. Army photo by Monica King. Jan. 26, 2017.

    Mattis mulls consolidation in IT, cyber

    In a Feb. 17 memo, Defense Secretary Jim Mattis told senior leadership to establish teams to look for duplication across the armed services in business operations, including in IT and cybersecurity.

  • Image from Shutterstock.com

    DHS vague on rules for election aid, say states

    State election officials had more questions than answers after a Department of Homeland Security presentation on the designation of election systems as critical U.S. infrastructure.

  • Org Chart Stock Art - Shutterstock

    How the hiring freeze targets millennials

    The government desperately needs younger talent to replace an aging workforce, and experts say that a freeze on hiring doesn't help.

  • Shutterstock image: healthcare digital interface.

    VA moves ahead with homegrown scheduling IT

    The Department of Veterans Affairs will test an internally developed scheduling module at primary care sites nationwide to see if it's ready to service the entire agency.

  • Shutterstock images (honglouwawa & 0beron): Bitcoin image overlay replaced with a dollar sign on a hardware circuit.

    MGT Act poised for a comeback

    After missing in the last Congress, drafters of a bill to encourage cloud adoption are looking for a new plan.

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group