Personal data: Up close and impersonal

Encrypted indexing techniques let security agencies exchange information more freely without sacrificing personal privacy.

The United States and the European Union continue to spar about how much information to release when they compare trans-Atlantic flight manifests and terrorist watch lists. Domestic agencies focused on anti-terrorism programs are fighting similar battles. The issue centers on preserving the privacy of innocent people while sharing information deemed essential to fighting terrorism.Many people regard new surveillance and data-sharing guidelines, including sections of the U.S. Patriot Act, as violations of the Fourth Amendment’s protections against illegal searches and seizures. “In the first year after 9/11, a lot of ships sailed because we didn’t know the nature of the threat we were looking at,” said Jim Harper, director of information policy studies at the Cato Institute and a member of the Homeland Security Department’s Advisory Council. But, he added, “given the time we’ve had since then to assess, a lot of those ships should be called back to port and decommissioned.”A better balance between national security and privacy might be feasible now with technology that anonymizes data, some computer science researchers say. The anonymization technique relies on software to scour scrambled data, such as passenger manifests and watch lists, which is encrypted and intelligible only to computers. The software flags any matches or suspicious patterns among datasets so government authorities can request additional information. “The government could get a subpoena or a National Security Letter to ask for specific records, and you’re back to reasonable and particular” controls spelled out in the Fourth Amendment, said Jeff Jonas, a chief scientist at IBM and developer of the company’s Anonymous Resolution Technology. Despite its promise, however, anonymization isn’t used as extensively in the federal government as some security experts would like. “I’ve been disappointed that the U.S. hasn’t adopted anonymizing technologies,” said Peter Swire, law professor at Ohio State University and a senior fellow at the Center for American Progress, a think tank. Swire also was the Clinton administration’s chief counselor for privacy, a now-defunct post. Swire said controls such as anonymization could have a ripple effect beyond privacy and could increase security by giving agencies access to more information than they have now. “Agencies might get more information if the public trusts them more,” he said. But others warn that anonymization isn’t a complete answer to privacy assurance. “It’s not a magic bullet,” said Paul Wormeli, executive director at the Integrated Justice Information Systems Institute, which provides technology training and consulting on information sharing for public safety, justice and homeland security organizations. “Anonymization is just that first step,” Wormeli said. “You also need to have a security architecture in mind.”Anonymization goes beyond figuratively blacking out a person’s name to hide an association with a record containing personally identifiable information. “Many people would naively believe [blacking out a person’s name] is good enough” to protect an individual’s privacy, said Latanya Sweeney, associate computer science professor and director of the Laboratory for International Data Privacy at Carnegie Mellon University. But it’s hardly that, she said. “With date of birth, gender and ZIP code, I can identify 87 percent of the people in the United States.”Some anonymization techniques store encrypted indexes of personal information in a central repository, an indexing approach Jonas compares to a library’s card catalog. It allows each law enforcement agency to retain detailed information internally and send only a sm ll amount of it to an encrypted index. That information could include passport and phone numbers, addresses and e-mail accounts. “When the system finds a match, it can’t tell you what the name or the passport number is” because it is encrypted, Jonas said. “It can tell you what record to ask about. Each party who holds data gets to control when they release it and to whom.”Jonas said the United States and other governments are using the IBM technology, but he declined to elaborate or say which agencies had implemented it. In some cases, agencies use it to protect information shared among internal departments that compartmentalize data related to specific missions. The software and index arrangement could also be used for interagency data exchanges, he said. The search capabilities within anonymization systems also speed the matching of lists, which allows people who don’t want to share their data to learn what they have in common. “A solution without any technology would be like the game Go Fish,” Jonas said. “One of the [groups] can pick up their phone and start reading off names, but you’d basically have to read your whole list.” A key characteristic of the IBM index is what Jonas calls one-way encryption, a technique for scrambling data without decrypting it for human eyes. The software is designed to find matches by comparing encrypted data, but there is no map available to unscramble it so a person could read it, he said. “If you want to reveal what is known behind the pointer, you have to go ask the person that has the record.” IBM isn’t the only company using encrypted indexes for anonymization. A company called InferX uses the approach not to match personally identifiable information but to safeguard the privacy of data being analyzed for behavior patterns that might flag suspicious activities. “What if you don’t know a name, address, phone number or bank account that you can use to find associations?” asked Jesus Mena, chief strategy officer at the company.Behavior analysis tools may increase in importance as terrorists become harder to profile. Many extremists aren’t associated with established cells run by al Qaeda, for example, making what Mena calls guilt-by-association matching less effective. “You need to look at suspicious patterns or anomalies for things that don’t behave in the normal way,” Mena said. Examples include shipping containers unexpectedly delayed at a foreign port or suspicious border-crossing patterns red-flagged by an uninsured vehicle driven by a foreign national at a time of day known to be favored by smugglers. Thus warned, authorities might then order a more thorough inspection of the container or vehicle, Mena said. Like InferX, Initiate Systems uses its encrypted index to look for matches in data pulled from participating systems. The difference is that Initiate Systems’ proprietary algorithms use statistical analysis and probabilities for matching, as opposed to algorithms that look for more direct one-to-one associations, said Scott Schumacher, the company’s chief scientist. “In our world, when you are matching on the anonymized data, you can say, ‘I want to see just the for-sure matches’ or ‘I want to see the fuzzy matches,’ ” Schumacher said. The company’s approach can assign weights to influence the likelihood of a match based on how frequently a certain value appears in the database. Encrypted information indexes can also help agencies meet regulatory requirements. An agency can’t legally share databases of personal information obtained for one purpose — a Social Security account, for example — with another agency conducting a different mission, such as anti-terrorism, said Steve Cooper, former CIO at the Homeland Security Department and former American Red Cross CIO. “Agencies can’t reach into another database without notifying the public and the Congress,” he said. But agencies can make inquiries about subsets of database information, s ch as names and birthdates. “Once we know that there’s a match, then we can follow the proper procedures,” Cooper said.The American Red Cross used an early version of the IBM technology in the aftermath of Hurricane Katrina to help state attorneys general track down parolees who were scattered by the storm and out of touch with parole officers. When requests by authorities for names of shelter occupants posed a privacy dilemma for the organization, Cooper’s staff used the anonymization technology to compare lists provided by law enforcement authorities with its internal files. “This allowed us to tell a state, ‘We do appear to have the following individual or individuals in this shelter,’ and then follow up so the shelter manager and the local authorities were notified,” Cooper said. “They then could talk to the individual to see if, in fact, if it was the right person. We were able to do that without disclosing [personal] information to authorities.”IBM, Initiate and InferX officials say one-way encryption is a sufficient safeguard for the use of a central index manager. Even if the index is breached, the information remains unintelligible because there is no key at the index site to unscramble the data. Sweeney, however, said the technique has weaknesses. She and her staff at Carnegie Mellon develop anonymization algorithms and techniques for privacy protection. “Most of the time in Washington, people feel like they have to choose either to be safe or to have privacy,” she said. “Our solutions give us this sweet spot where you can have both.”The software, known as PrivaMix, has been used for compliance with the Health Insurance Portability and Accountability Act in health care organizations and to protect the identities of clients in domestic-violence and homeless shelters funded by the Housing and Urban Development Department. Sweeney said she has talked with some government officials about homeland security applications for the technology.The Privacy Lab’s approach is to eliminate the central index and the associated need for a trusted party to maintain it, which in Sweeney’s view is the main vulnerability of the indexing technique. Instead, each member of a data-sharing network would use PrivaMix to assign numerical codes to their data about clients and facility visits and then share the codes via secure Internet connections. The software could later match codes held in separate databases. Throughout the matching process, individuals’ identities wouldn’t be divulged, and agencies wouldn’t share any actual data. They would simply share the codes. A similar approach could be used if an agency wanted to compare information from government watch lists, airlines, travel companies, hotels or car rental agencies, for example, Sweeney said. PrivaMix uses a proprietary form of data scrambling that Sweeney said is as effective as standard strong-encryption technology, and it doesn’t include a decryption key. Anonymization techniques require some data customization to increase the accuracy of matches. Proponents say the technology is mature enough for use in homeland security programs to help balance information sharing and privacy controls. But industry officials say government officials don’t seem to be in a hurry to adopt the technology. “We’ve had three or four meetings with [Customs and Border Protection officials], and they are excited about the capability, but they are very slow in moving,” Mena said. One problem has been turnover among the people his company has briefed, he added. Citing reluctance to discuss data sharing related to homeland security programs, intelligence and law enforcement agencies — including the Office of the Director of National Intelligence and the Justice Department — declined interviews for this story.The frustration of industry officials and others in getting their views heard is evident. Sweeney said she is disturbed that discussions about anti-terrorism ctivities and privacy are often polarized. “It seems that everybody just wants to keep the conversati n stuck in this all-or-nothing, win-everything or lose-everything mode,” she said. “It just doesn’t have to be that way.”



















Scrambled information






























Privacy controls












Whom can you trust?















Lingering frustration