NSA shows how big 'big data' can be

Experts say the massive scale of metadata that the NSA is collecting represents a daunting challenge in finding useful information within it.

abstract head representing big data

If big data was cheap and easy and always resulted in an abundance of relevant insights, every agency and organization would do it.

The fact that so few federal agencies are engaging this new technology – zero out of 17 in a recent Meritalk survey – only highlights the challenges inherent with what recent intelligence leaks show the National Security Agency is trying to do.

NSA reportedly collects the daily phone records of hundreds of millions of customers from the largest providers in the nation, as well as a wealth of online information about individuals from Internet companies like Facebook, Microsoft, Google and others.

To put the NSA's big data problems into perspective, Facebook's 1 billion worldwide users alone generate 500 terabytes of information per day – about as much data as a digital library containing all books ever written in any language. Worldwide, humans generate 6.1 trillion text messages annually, and Americans alone make billions of phone calls each year.

Even if the NSA takes in only a small percentage of the metadata generated daily by those major companies and carriers in its efforts to produce foreign signals intelligence and thwart terrorists, the information contained therein would be a vast sea of data.

In response to the recent reports by The Guardian and The Washington Post with information they received from former NSA contractor Edward Snowden, Director of National Intelligence James Clapper confirmed the Prism program's existence, but provided scant information about the system itself except to clarify that it was not a data mining or collection tool.However, a story published June 11 on CNN.com detailed several terror plots that were apparently foiled by intercepted electronic communications.

While the public does not know what the NSA does with its available data sets, it is clear that the NSA faces challenges in its efforts that industry and private-sector companies don't, according to Paul Kocher, president and chief scientist at San Francisco-based Cryptography Research.

Kocher said firms like Google typically analyze large datasets to siphon general inferences that help it optimize business, achieve greater efficiencies or perhaps chart a phenomenon. Retail companies have perfected these models, which is why you might notice certain ads popping up on certain applications after you've attended a specific venue or made a unique keyword search online.

But the NSA's challenges aren't in generalizing large data sets, they're in finding tiny nuggets of data that might turn out to be a terrorist communication or signal, and that is a huge undertaking.

"The NSA is focused on very specific information about a fairly well-defined threat – they are warehousing data and drilling it down at the narrowest levels and having people look at it," Kocher said. "The metadata the NSA is known to collect can give them a good picture of the very specific things they are interested in."

As reported by Information Week, the NSA relies heavily on Accumulo, "a highly distributed, massively parallel processing key/value store capable of analyzing structured and unstructured data" to process much of its data. NSA's modified version of Accumulo, based on Google's BigTable data model, reportedly makes it possible for the agency to analyze data for patterns while protecting personally identifiable information – names, Social Security numbers and the like.

Before news of Prism broke, NSA officials revealed a graph search it operates on top of Accumulo at a Carnegie Melon tech conference. The graph is based on 4.4 trillion data points, which could represent phone numbers, IP addresses, locations, or calls made and to whom; connecting those points creates a graph with more than 70 trillion edges. For a human being, that kind of visualization is impossible, but for a vast, high-end computer system with the right big data tools and mathematical algorithms, some signals can be pulled out.

Rep. Mike Rogers (R-Mich.), chairman of the House Intelligence Committee, publicly stated that the government's collection of phone records thwarted a terrorist plot inside the United States "within the last few years," and other media reports have cited anonymous intelligence insiders claiming several plots have been foiled.

Needles in endless haystacks of data are not easy to find, and the NSA's current big data analytics methodology is far from a flawless system, as evidenced by the April 15 Boston Marathon bombings that killed three people and injured more than 200. The bombings were carried out by Chechen brothers Dzhokhar and Tamerlan Tsarnaev, the latter of whom was previously interviewed by the Federal Bureau of Investigation after the Russian Federal Security Service notified the agency in 2011 that he was a follower of radical Islam. The brothers had made threats on Twitter prior to their attack as well, meaning several data points of suspicious behavior existed, yet no one detected a pattern in time to prevent them from setting off bombs in a public place filled with people.

"We're still in the genesis of big data, we haven't even scratched the surface yet," said big data expert Ari Zoldan, CEO of New-York-based Quantum Networks. "In many ways, the technology hasn't evolved yet, it's still a new industry."

In all likelihood, the NSA is one of the few organizations at the forefront of big data, and it's already gotten past some of the initial barriers to harnessing the technology: Cost and manpower.

Double the size the of the FBI and the Central Intelligence Agency, NSA has the analysts necessary to delve into these massive data sets, Zoldan said, and whatever the costs have been – NSA does not publish its budget – the agency has covered them.

"Up front, the costs are exorbitant, but as we better understand what big data is, there will be an industry that will evolve and focus on interpreting and understanding and translating that data, and it's going to get cheaper," Zoldan said. "But in my opinion, you can't put a price tag on human life, and the underlying goal of the NSA is to thwart terrorism and threats to the US."

The cost to store information, already cheap by comparison, will continue to decrease as well, Zoldan said. While it might takes years before sufficient tools exist to analyze all stored information, Zoldan said at some point, technology will catch up, and that may allow agencies that have large data sets siloed off to really put that data to use.

"It's like if someone is hoarding up land on Mars, what are they going to do with it right now?" Zoldan said. "But eventually, as space exploration continues to evolve, that land will be worth something."

NEXT STORY: Senate questions NSA head

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.