What tomorrow may bring
As demonstrated by recent warnings of possible attempts to hijack airplanes in the United States, the stakes have never been higher in the war on terrorism. It's no wonder that numerous government agencies and the CIA-affiliated venture-capital investor In-Q-Tel will spend millions of dollars this year on security-related information-analysis technologies.
Authorities are increasingly focused on programs that gather and organize unstructured intelligence data,
such as e-mail and voice transcripts. Underlying these programs are pattern-recognition algorithms — some developed at universities, others at commercial software firms — which seek to automate the process of finding clues. Experts hope that analysts might eventually use these kinds of tools to help predict and thwart future terrorist attacks.
The challenge is daunting. Unstructured data includes not only e-mail messages but also field intelligence reports, surveillance photos, intercepted communications, various official records such as immigration documents and shipping forms, and even video feeds from international news agencies.
Security agencies need technology that also can mine information about international political parties, factions and leaders from sources that represent dozens of languages, said Laura Ramos, director of Forrester Research Inc. In addition to the sheer volume of information, the intelligence may come in digital and nondigital formats, complicating the creation of one easy-to-view picture.
Since the Sept. 11, 2001, attacks, pattern recognition has become a tool to identify subjects of interest, essentially providing a way to monitor known targets. There's growing interest to take automation to the next level to provide predictive capabilities.
"The essence of intelligence analysis is you don't know what you'll need to be looking for tomorrow morning," said Claude Vogel, chief technology officer at Convera Corp., which provides pattern-recognition software to the FBI and other agencies.
To do this, software applications must go beyond keyword searches and high-level document classifications that create Yahoo-like directories, experts say. Prediction requires the ability to home in on documents and uncover, say, a piece of information on Page 36 of a 50-page transcript of a wireless phone conversation, said Barak Pridor, chief executive officer of ClearForest Corp., which develops information-extraction and business intelligence software.
To succeed, pattern-matching algorithms must be combined with other disciplines, such as statistical analysis and semantics research. For example, an intelligence system might search a document to identify a noun. If that noun is associated with a direct quote, the system assumes the noun represents a person.
The next step is to create associations among people, organizations and events. "You want to cross-reference terrorist information within a geographical context," Pridor said. "Then each dimension — people, geography, names of known terrorist groups — is cross-referenced interactively to uncover patterns. Assume I've done this with a large set of defense reports. I can then piece these nuggets together into a higher-level view that shows me all the people related to al Qaeda in the Kandahar region" of Afghanistan.
Similarly, intelligence agents might uncover terrorist-related money-laundering operations using information threads that relate to banks of a certain size that are cross-matched with data about individuals and intelligence about weapons systems sales.
"If a transaction that goes through certain banks is followed by an arms transaction 60 days later, and you see that happen two or three times, a pattern starts to emerge," said David Spenhoff, vice president of Inxight Software Inc., a pattern-recognition software maker that recently announced 10 new federal government contracts totaling about $3 million. "Analysts can then drill down to look for actionable intelligence."
In time, this type of data manipulation could become as sophisticated as online analytical processing (OLAP), Pridor said. OLAP is commonly used in the commercial sector to uncover patterns within highly structured relational databases, such as retail point-of-sale records and credit card transactions.
The promise of pattern recognition is turning into concrete projects. Last month, the FBI was expected to launch an expanded version of its Secure Collaborative Operational Prototype Environment (SCOPE), which uses pattern-recognition software from ClearForest and Convera. The new project represents about $5.2 million in revenue for Convera.
"We're trying to make significant enhancements in our ability to analyze, assess and manage the enormous volume of data that traditionally the FBI has always done a great job of acquiring," FBI spokesman Paul Bresson said. "By utilizing commercially available tools, we are expanding on our efforts to more effectively establish relationships and patterns that aid us in preventing acts of terror."
Meanwhile, the Air Force's Project Eyes is using the Web-enabled Temporal Analysis System to analyze visual information. The system was developed by the Air Force Research Laboratory along with contractors Northrop Grumman Information Technology and Intelligent Software Solutions Inc.
Imaging, in this case data gathered by cameras mounted on airborne drones, is another pattern-recognition intelligence challenge. Analysts may be looking for a certain shape, such as that of a mobile rocket launcher, and instruct a pattern-recognition system to run through large amounts of video to find where that shape occurs. The system can compare the next day's video images of the same area to determine if the military equipment has moved, which could predict an impending assault.
Similarly, the military's American Forces Information Service will maintain the Video Information Management System (VIMS), a repository for unstructured data for the Defense Department that was also slated for completion last month. VIMS will collect and centralize visual information now held in a variety of data storehouses. Initially, the data store will span about 20 terabytes but could grow to 120 terabytes when it's fully functional, according to Mark Wells, technical director of TranTech Inc., the systems integrator that received about $10 million in DOD funds for the project.
Despite this recent activity, pattern recognition is far from a mature technology, according to some analysts.
"The [prediction] challenge is a lot harder than people in government may be admitting," said Scott Weidman, director of the Board on Mathematical Sciences and Their Applications for the National Academy of Sciences in Washington, D.C.
He believes it's easy to underestimate the amount of progress that's needed for pattern-recognition tools to successfully sift through huge volumes of information. "Certain algorithms that might work with a couple megabytes of data don't work when you have gigabytes' worth of information," Weidman said.
"Technologically speaking, some products haven't necessarily lived up to what they were being sold as," Wells added.
History hasn't helped pattern recognition's sometimes poor perception. "Pattern recognition is linked to [artificial intelligence], which was very hyped in the '70s and '80s, and that was very detrimental," said Sameer Samat, chief technology officer at Kofax Image Products Inc., which bought pattern-recognition software maker Mohomine Inc. last year. "For a time, if you mentioned pattern recognition, people just hung up the phone."
But new interest, based on security necessities arising after the 2001 terrorist attacks, may bring more popularity to pattern recognition.
"There's now an openness to looking at any technology that can reduce terrorist threats, so there's a rebirth of pattern recognition, except it's being approached in a way that's smarter," Samat said. "This technology doesn't replace the need for humans; it makes the intelligence community more effective by helping it to refocus on higher-priority activities."
In-Q-Tel, the CIA-backed venture fund, has invested in a number of small, private companies that use underlying pattern-recognition technology. These have included Mohomine before its acquisition, Inxight, Intelliseek Inc., NovoDynamics Inc. and Stratify Inc. Many of the latest investments focus on commercial products that let intelligence analysts concentrate on their subject-matter expertise rather than how to make the tools perform properly.
"Most of the tools that have been out there were constructed around an idea of the operator as a statistician," said Andrew Maker, a visionary solutions architect at In-Q-Tel's Rosslyn, Va., facility. "They required that you understand the statistical aspects of pattern recognition, and they also required that you have a solid understanding of what's in the dataset. But when you work with very large datasets, you don't have control over the quality or the mix of data."
He calls it a "Where's Waldo" problem, referring to the illustrated children's book in which the object is to find a boy in a striped shirt hidden in a picture busy with imagery. But in the real world, government analysts have to identify a target whose appearance doesn't remain static: Waldo ages and changes clothes as he shows up in various intelligence reports.
"Most models are not really designed for that," Maker said. "The tools out there have really not made it easy for subject-matter experts to quickly leverage [their expertise] without massive time investments to sample data and play with it."
Some pattern-recognition applications now under development are addressing this issue with data-
visualization technology that is making progress in sorting through massive amounts of data. "Visual mining uses tools to classify data to find areas of interest," Maker said.
Data visualization tries to turn overwhelming mounds of text and statistics into easier-to-comprehend graphical representations, such as icons, timelines, graphs or charts that show relationships among various subjects within the data.
Samat agrees that pattern-recognition software needs to become easier to use. "The technology can be mysterious to people because there are a lot of knobs to tweak, a lot of rules you need to know," he said. Technologies can streamline this process by watching how analysts build models and then applying those areas of interest to future models.
Also in the near future, the technology will need to get better at processing intelligence within multilingual data storehouses. "The world is getting smaller every year, and more and more information that's important to homeland security is not in English," said John Cronin, vice president for the government sector at software vendor Autonomy Corp., which sells pattern-recognition applications to the Homeland Security Department.
For now, predictive pattern matching is an emerging trend rather than fully formed. "From the perspective of combating future terrorism, it really comes down to getting information to the right people," Samat said. "In that way, you leverage human predictive capability. People are pretty good at solving problems, if they're not overwhelmed with too much information."