Xerox announces categorization software

Research scientists at Xerox Research Centre Europe say they have perfected a new method for automatically categorizing electronic messages and documents for future retrieval.

The method uses unnamed software that performs what the scientists call "deep linguistic analysis." The technique could be useful, for example, for categorizing documents that should be preserved as federal records, the scientists said. Written in Java, the software can be integrated into existing document management and workflow systems.

"It's exciting news if true," said J. Timothy Sprehe, president of Sprehe Information Management Associates Inc., a consulting company in Washington, D.C. "There's enormous interest in auto-categorizing e-mail," especially among federal records managers.

Eric Gaussier, a research scientist at the center, said the new software represents an advance over existing categorization software, which is offered in some products and in the public domain. The software recognizes, for example, that words can have several meanings, depending on their context. It also recognizes that different words can mean the same thing, he said.

Since 1993, the research center has been developing linguistic analysis tools for different uses and in 20 languages, Gaussier said. The categorization software is a new use for those tools and for machine learning, for which the center is also known.

Such tools are very much needed, Sprehe said. In most federal departments, the volume of e-mail has grown so large that having people categorize e-mail messages for preservation as federal records is nearly impossible, he said. "It's no longer a practical solution," he said.

However, most experts in the field of records management say that automated filtering of records still leaves much to be desired. "The general conclusion is that auto-categorization is not yet ready for prime time," Sprehe said. "Everyone who is interested in this will say they want to see the proof first.


  • Congress
    U.S. Capitol (Photo by M DOGAN / Shutterstock)

    Funding bill clears Congress, heads for president's desk

    The $1.3 trillion spending package passed the House of Representatives on March 22 and the Senate in the early hours of March 23. President Trump is expected to sign the bill, securing government funding for the remainder of fiscal year 2018.

  • 2018 Fed 100

    The 2018 Federal 100

    This year's Fed 100 winners show just how much committed and talented individuals can accomplish in federal IT. Read their profiles to learn more!

  • Census
    How tech can save money for 2020 census

    Trump campaign taps census question as a fund-raising tool

    A fundraising email for the Trump-Pence reelection campaign is trying to get supporters behind a controversial change to the census -- asking respondents whether or not they are U.S. citizens.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.