Attensity masters linguistics

Company touts its mid-October release of Attensity 4 as a complete analyst’s desktop

Text analytics software is proving to be immensely useful in federal programs that involve massive amounts of unstructured information that would overwhelm employees if they had to read it all. Powerful computers running software that doesn’t balk at such tasks are helping U.S. Patent and Trademark Office examiners, for example, review millions of lines of publicly available computer source code.

One of the analytic engines that helps the examiners is from Attensity, which will release a new text analytics suite in mid-October. That Web-based suite, named Attensity 4, combines statistical and linguistic techniques to create a powerful analyst’s desktop PC, said Michelle DeHaaff, vice president of products and marketing at Attensity.

By using Attensity 4, DeHaaff said, “an analyst could take action on what’s in 20,000 documents without having to read them.”

Demand for software that could find facts and connections hidden in unstructured data spiked after the 2001 terrorist attacks, and Attensity benefited from the interest of investors such as In-Q-Tel, a venture capital group at the CIA. In 2002, In-Q-Tel became a lead investor in Attensity. It followed with a second round of funding and still maintains a close relationship with the company, DeHaaff said.

Attensity uses a process that extracts data from text and stores it as rows and columns in a relational database. The process, which Attensity calls exhaustive extraction, is similar to techniques for extracting, transporting and loading (ETL) data from an enterprise resource planning system or customer relationship management system into a data warehouse. “Attensity considers itself the ETL for text,” DeHaaff said.

Attensity’s partner companies, which resell its analytics software, are leading data warehousing and business intelligence companies: Business Objects, IBM, Oracle and Teradata. Attensity’s software is compliant with IBM’s unstructured information management architecture, an open-source standard for integrating and accessing unstructured data.

Attensity’s competitors in the text analytics market tend to rely more on tagging documents using Extensible Markup Language. Attensity can do XML tagging and support XQuery technology, but it chose to take advantage of people’s familiarity with relational databases and SQL queries. “Pretty much everyone does those,” DeHaaff said.

Attensity 4 combines six techniques for extracting information from text documents, such as phone records, e-mail messages and call center reports. The combination of techniques often detects problems early, before they become publicized crises, DeHaaff said. The Centers for Medicare and Medicaid Services, for example, uses Attensity’s text analytics software to analyze hundreds of thousands of call center records to detect problems with its new prescription drug program before they get out of hand.

“What are people complaining about the most? It turns out the unstructured data is really a leading indicator about what’s going on,” said Mike Lewis, vice president of government systems at Attensity.

Text analytics “is more than ready for prime time,” said Susan Feldman, research vice president for content technologies at IDC.

“One of the things that Attensity does so well is to encapsulate the knowledge that gives you generic text mining,” she said.

NEXT STORY: House gives urgency to IT security