Boston probe's big data use hints at the future

A day after the explosions at the Boston Marathon, authorities had amassed 10 terabytes of data in their search for clues.

bombing crime scene

The FBI turns the Boston Marathon bombing crime scene back over to the city in an informal ceremony held April 22. (FBI photo)

Less than 24 hours after two explosions killed three people and injured dozens more at the April 15 Boston Marathon, the Federal Bureau of Investigation had compiled 10 terabytes of data in hopes of finding needles in haystacks of information that might lead to the suspects.

The tensest part of the ongoing investigation – the death of one suspect and the capture of the second – concluded four days later in part because the FBI-led investigation analyzed mountains of cell phone tower call logs, text messages, social media data, photographs and video surveillance footage to quickly pinpoint the suspects.

A big assist in this investigation goes the public, which presented perhaps the best illustration of a crowd-sourced investigation in recent memory. Not only did the public respond to the FBI's request for information – the agency ultimately received several thousand tips and loads of additional photographs and video footage – but a citizen's tip ultimately led to the capture of the surviving suspect.

Still, the investigation showed a glimpse of what big data and data analytics can do -- and highlighted how far we yet have to go.

Knowledge is power

Big data is a relatively new term in technology and its definition varies amongst early practitioners, but the main goal of any big data project is to pull insights from large amounts of data.

Prominent statistician Nate Silver describes it as "pulling signal from the noise" – noise that can be a veritable smorgasbord of different kinds of information. The noise can be big, too – some datasets within the federal government are measured in petabytes, each of which is one million gigabytes or 1,000 terabytes.

So the 10 terabytes gathered by investigators is not a large data collection even in today's relatively early stages of big data technology. But the investigation's processes still presented officials with a data crunch due to the volume, variety and complexity, according to Bradley Schreiber, vice president of Washington operations for the Applied Science Foundation for Homeland Security.

To get a sense for the initial complexities of combining various data sets in the early moments of the investigation, consider this: In the aftermath of the bombing, cellular networks in the area were taxed beyond their capacity. AT&T put out a tweet urging those in the area to "please use text & we ask that you keep non-emergency calls to a minimum."

There was speculation that the bombs could have been triggered remotely by mobile phones, prompting interest in traffic logs from area cell towers to try to get a fix on the culprits. That geo-location information could then be cross-checked against surveillance video and eyewitness photography – just another layer of data available to law enforcement when trying to stitch together a detailed and textured version of events.

To handle all this data at once, the FBI had the services of local law enforcement, as well as manpower and technology from a collection of federal agencies -- including the Department of Homeland Security and the 16 other agencies that make up the intelligence community (IC).

Because of the secrecy involved in the IC, many of its most innovative technologies are not well-understood by the public, and often their uses are not confirmed via official sources.

Yet some are known anyway. Counterterrorism sources told several publications that facial recognition software was being used to compare faces in photographs and video against visa, passport, driver's license and other databases.

An individual not authorized to speak about the investigation told FCW that DHS used situational awareness tools that allow its personnel to act as field sensors through their mobile devices, interacting in real-time with a central network. Through a smart phone, a user's position can be triangulated by the central network, and the user can push alerts to the network via text, image or video, for instance, or view where other mobile device users with the application are located via mapping technology. The central hub then stands up a virtualized map of all its field users and logs what data they're sending back and forth.

New challenges with social media

Terabytes or more of video, images, text messages and cell phone records are complex to compare in their own right, but social media data adds a new wrinkle for investigators.

A tool from Topsy Labs, a company that bills itself as having the only full-scale index of the public social web, was used by local and federal officials during the Boston bombing investigation to sift through torrents of tweets.

Topsy has stored every tweet since July 2010, and in the Boston investigation, its tool allowed investigators to run big-data analytics of Boston-related tweets against hundreds of billions of past and present messages.

Using Topsy, investigators could search every reference ever made on Twitter of the word "bomb," in a specific region – like the city of Boston and its suburbs.

Such a search would have turned up since-deleted bomb references from both the suspects' Twitter accounts, said Rishab Ghoshi, Topsy's chief scientist and co-founder.

This kind of search through public information may have also revealed other important clues for investigators, like which users re-tweeted the bomb mentions or engaged in dialog with the suspects.

Furthermore, Topsy's "geo-inferencing" capabilities allow users to accurately map where specified tweets are originating, despite the fact that only about one percent of Twitter users geo-tag their tweets. Those capabilities make it "20 times more accurate" than standard Twitter location data, Ghosh said.

"Technology plays the role of identifying signal and extracting it from this noise, and allows you (as a user) to access information without someone, like a journalist or official, editing it, yet you're still hearing relevant voices," Ghosh said. "What has happened now is the way people communicate through public conversation, until the past few years, that has been inaccessible. The Internet has changed that, and now public conversations are publicly accessible."

DHS has certainly taken advantage of this fact, and was almost certainly keeping tabs on the investigation through social media, too.

Since 2011, the agency has monitored public-facing social media networks, blogs and content aggregators. The monitoring has stirred up controversy among privacy advocates because of worries that DHS would be collecting personally identifiable information (PII) on social media users. The agency has stated that they're not doing data mining for PII, but that they would use such information in exigent circumstances to rescue, say, an earthquake victims tweeting from under a pile of rubble or the victim of a terrorist attack trapped in a hotel. The National Operations Center at DHS "identifies and monitors only information needed to provide situational awareness and establish a common operating picture," according to an April privacy impact assessment from DHS about the program.

These monitoring efforts require DHS to establish accounts with usernames and passwords on public social media sites like Twitter, Facebook, YouTube, Flickr, as well as a host of Twitter search and trend sites, but these DHS accounts are not supposed to interact with other uses, make friends, or share content across networks.

They are designed to lurk and watch for the appearance of terms that indicate a social media post or news item is about a terrorist attack, cyber-security breach, natural disaster, public health emergency, or other threatening situation is in progress.

In its privacy memo, DHS spells out the terms on its radar screen. Some that would have come up on social media in the aftermath of the Boston Marathon bombing include "explosion," "bomb," "shelter-in-place," and "lockdown."

What's to come

John Crupi, Chief Technology Officer of Washington, D.C.-based JackBe, which designs real-time operational intelligence software and has contracts within the IC, said the Boston bombing investigation highlights where technology is and where it might soon go.

While the suspects were pinpointed, Crupi said the fact that investigators were asking people to send in digital photos during the early hours of the investigation and engaging in lengthy exchanges with tipsters via the phone and e-mail may show too much reliance on the public.

If the right people did not come forward, the investigation might have stalled. However, improving technology is likely going to change that, Crupi said. One day, predictive analytics might actually allow savvy investigators to prevent crimes and tragedies like the Boston bombings before they happen.

Think Minority Report, but replace the psychics with a blazing-fast, massive and scalable cloud-based infrastructure that seamlessly wields disparate, complex data sets produced by people, drones, satellites and other smart machines. In fact, the IC is working on installing this kind of technology right now, so such powerful predictive analytics may not be far off.

"The goal is going to be to take all this real-time data and ultimately connect all the dots to authoritative systems," Crupi said. "If you can connect them, then you can make a more probabilistic decision on how real or possible something is."

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.