Data mining: The new weapon in the war on terrorism?

Use of the technology to probe vast amounts of phone data could be costly and invade privacy

If the government is analyzing Americans’ phone records to discover and track terrorist networks — or ever plans to do so — the requisite technology would cost a lot of money, demand considerable computing power and raise privacy issues, observers say.

The possibility that the government is sifting through tens of millions of phone records came to the public’s attention earlier this month after USA Today reported that the National Security Agency had collected records from AT&T, Verizon and BellSouth.

Although it is unknown if the government is probing phone records for national security purposes, the possibility shines a spotlight on the potential benefits and drawbacks of a sophisticated technology that few people fully understand.

That technology is data mining, or extracting knowledge from a vast amount of data. The technique requires super fast computers and software capable of performing complex algorithms, experts say.

Nathan Hoskin, chief architect of Planning Systems, a data analysis and engineering systems developer, said the government would need supercomputers “on the scale of Blue Gene or Columbia — or you could also create what amounts to a supercomputer out of hundreds or thousands of regular PCs.”

The development of a data-mining system that could analyze U.S. phone data would cost somewhere in the range of $20 million to $50 million, added Hoskin, whose company has worked with federal agencies.

If telecommunications companies hand over their records, three kinds of algorithms might be helpful in investigating potential terrorist cells: clustering algorithms, link analysis and association rule mining.

The first — clustering algorithms — focuses on pieces of data that are similar to one another. The second — link analysis — attempts to connect the dots among disparate datasets, such as terrorist conspirators scattered worldwide.

“Terrorists are smart enough to know that if ‘Al’ and ‘Joe’ are both known criminals, they can’t talk directly without attracting law enforcement’s attention,” said Hoskin, who has worked on data analysis and data-mining projects for corporations such as Equifax and Enron during his 25-year career. “With link analysis algorithms, you can start looking for common sets of paths [or] routes.”

For example, intelligence officials might be able to identify a terrorist cell leader by tracing call routes. The algorithm might show that a Texas-based terrorist who attacked a facility in Austin had previously communicated with a conspirator in Oklahoma City, who had spoken with a co-conspirator in Boston, who in turn had been in touch with someone in Spain, and on and on, until the call route stopped in Pakistan. Then the officials may target the Pakistani caller as a possible cell leader.

But this approach can produce meaningless data because it becomes harder for the link analysis to connect the dots once the route extends five or six hops, Hoskin said.

The third method — association rule mining — looks for patterns within data. If every time Al gets a call from Oklahoma City he then immediately calls Pakistan, the algorithm associates calls originating in Oklahoma City with the country Pakistan. The association may raise red flags for intelligence officials.

Computer programs can combine all of those algorithms, too. If the composite picture points to the same person, the government could decide to probe every contact that person has called in the past few years.

Hoskin said he thinks the government would be reluctant to delve into this sort of personal information until the data mining produces convincing evidence.

“If I was an agent of the government, it wouldn’t be until the point that something had really piqued my interest that I’d say… ‘Do a lookup on this number and find all the people associated with it,’” he said.

But privacy advocates say mining phone records could produce a mountain of civil rights violations without ever generating one lead.

Jay Stanley, public education director of the American Civil Liberties Union’s technology and liberty program, said intelligence work could easily creep from mining to wiretapping and other modes of surveillance. “We have to expect that anybody that gets flagged by one tool, like this telephone records database, would find themselves subject to the National Security Agency’s other spying tools, whatever those might be.”

Critics say the possible data-mining initiative resembles the Defense Department’s scrapped Total Information Awareness program, which was envisioned as a way to anticipate potential terrorist attacks by analyzing patterns from a massive and wide-ranging database of electronic information.

“There’s a lot of evidence that the National Security Agency is engaging in data-mining activities that do bear some resemblance to the TIA program,” Stanley said. “I think one of the primary questions that Congress needs to investigate is to what extent they are engaging in TIA-like activities by sharing private phone records.”

Even if phone companies are not giving out personal identifiers — customers’ names, street addresses and other personal information — the government can obtain personal information from a phone number via other databases and services, according to data-mining experts.

“It would take a large bank, much less the National Security Agency, about 10 minutes to assign names to all those phone numbers,” Stanley added.

Earlier this month, a federal auditor testified to the House Judiciary Committee Commercial and Administrative Law Subcommittee that agencies had failed to comply with data-mining protocols as recently as August 2005.

“Increased use by federal agencies of data mining — the analysis of large amounts of data to uncover hidden patterns and relationships — has been accompanied by uncertainty regarding privacy requirements and oversight of such systems,” said Linda Koontz, information management issues director at the Government Accountability Office, testifying before the subcommittee.

“As we reported in previous work, the result was that although agencies employing data mining took many steps needed to protect privacy, such as issuing public notices, none followed all key procedures, such as including in these notices the intended uses of personal information,” she said.

In comparing wiretapping to looking at phone records, observers say both pose threats to Americans’ privacy.

“Listening to the content of calls is more intrusive, but nobody should underestimate the privacy invasion that’s involved in tracing who’s talking to whom,” Stanley said. He added that the effort could expose innocent citizens’ calls to therapists, lovers and hot lines.

“People have the implicit expectation that the list of people they call will not be shared with their neighbors or the government,” Stanley said.

Mining phone records to find terrorists could be a waste of time, akin to tagging the entire U.S. population as a possible suspect, he said.

“Most of the successes we’ve seen in the national security area seem to be old-fashioned, stick-to-the-basics investigative work…start from known leads and work outward,” Stanley said.

Mining phone data

Data-mining expert Nathan Hoskin, who has worked on data analysis projects for corporations such as Equifax and Enron during his 25-year career, said the government is probably interested in two kinds of data that telecommunications companies collect: billing information with call logs and fees, and proprietary analyses of their networks’ quality performance.

To provide better service and maximize revenue, telecom companies monitor the types of phone technologies in use — such as voice over IP, cellular and landline — the frequency of each system’s use and the costs of operating each system, Hoskin said. Those measurements can pinpoint overloaded switches and inform phone companies of sites that need increased capacity.

Because terrorists are not likely to use phones registered to them or even consistently use the same phone, the second set of data can reveal information that billing data cannot, Hoskin said. For example, network analyses can show where a call originated, the length of the call, the technology that supported the connection and the quality of the connection.

The government can blend both sets of data for even more clues.

“The terrorists who are looking to do harm, as a safe bet, can assume they are being watched at this point,” Hoskin said. “So they are going to try to find ways, like a raccoon, to cover [their] scent.”

The 2015 Federal 100

Meet 100 women and men who are doing great things in federal IT.

Featured

  • Shutterstock image (by venimo): e-learning concept image, digital content and online webinar icons.

    Can MOOCs make the grade for federal training?

    Massive open online courses can offer specialized IT instruction on a flexible schedule and on the cheap. That may not always mesh with government's preference for structure and certification, however.

  • Shutterstock image (by edel): graduation cap and diploma.

    Cybersecurity: 6 schools with the right stuff

    The federal government craves more cybersecurity professionals. These six schools are helping meet that demand.

  • Rick Holgate

    Holgate to depart ATF

    Former ACT president will take a job with Gartner, follow his spouse to Vienna, Austria.

  • Are VA techies slacking off on Yammer?

    A new IG report cites security and productivity concerns associated with employees' use of the popular online collaboration tool.

  • Shutterstock image: digital fingerprint, cyber crime.

    Exclusive: The OPM breach details you haven't seen

    An official timeline of the Office of Personnel Management breach obtained by FCW pinpoints the hackers’ calibrated extraction of data, and the government's step-by-step response.

  • Stephen Warren

    Deputy CIO Warren exits VA

    The onetime acting CIO at Veterans Affairs will be taking over CIO duties at the Office of the Comptroller of the Currency.

  • Shutterstock image: monitoring factors of healthcare.

    DOD awards massive health records contract

    Leidos, Accenture and Cerner pull off an unexpected win of the multi-billion-dollar Defense Healthcare Management System Modernization contract, beating out the presumptive health-records leader.

  • Sweating the OPM data breach -- Illustration by Dragutin Cvijanovic

    Sweating the stolen data

    Millions of background-check records were compromised, OPM now says. Here's the jaw-dropping range of personal data that was exposed.

  • FCW magazine

    Let's talk about Alliant 2

    The General Services Administration is going to great lengths to gather feedback on its IT services GWAC. Will it make for a better acquisition vehicle?

Reader comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above