Xerox announces categorization software

Research scientists at Xerox Research Centre Europe say they have perfected a new method for automatically categorizing electronic messages and documents for future retrieval.

The method uses unnamed software that performs what the scientists call "deep linguistic analysis." The technique could be useful, for example, for categorizing documents that should be preserved as federal records, the scientists said. Written in Java, the software can be integrated into existing document management and workflow systems.

"It's exciting news if true," said J. Timothy Sprehe, president of Sprehe Information Management Associates Inc., a consulting company in Washington, D.C. "There's enormous interest in auto-categorizing e-mail," especially among federal records managers.

Eric Gaussier, a research scientist at the center, said the new software represents an advance over existing categorization software, which is offered in some products and in the public domain. The software recognizes, for example, that words can have several meanings, depending on their context. It also recognizes that different words can mean the same thing, he said.

Since 1993, the research center has been developing linguistic analysis tools for different uses and in 20 languages, Gaussier said. The categorization software is a new use for those tools and for machine learning, for which the center is also known.

Such tools are very much needed, Sprehe said. In most federal departments, the volume of e-mail has grown so large that having people categorize e-mail messages for preservation as federal records is nearly impossible, he said. "It's no longer a practical solution," he said.

However, most experts in the field of records management say that automated filtering of records still leaves much to be desired. "The general conclusion is that auto-categorization is not yet ready for prime time," Sprehe said. "Everyone who is interested in this will say they want to see the proof first.

Featured

  • Defense
    Ryan D. McCarthy being sworn in as Army Secretary Oct. 10, 2019. (Photo credit: Sgt. Dana Clarke/U.S. Army)

    Army wants to spend nearly $1B on cloud, data by 2025

    Army Secretary Ryan McCarthy said lack of funding or a potential delay in the JEDI cloud bid "strikes to the heart of our concern."

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

Stay Connected

FCW INSIDER

Sign up for our newsletter.

I agree to this site's Privacy Policy.