Grouped for success
- By John x_Zyskowski
- Oct 01, 2001
Vendors that sell software to help the U.S. Postal Service and other agencies categorize mountains of unorganized online records into user-friendly, topic-based directories are starting to realize what many folks already know—sometimes people are a lot smarter than computers.
That's why recent start-up Quiver Inc. and market veteran Semio Corp. are taking a new tack and offering—or soon will—automated content categorization engines that give people more control over how the software organizes their records. The aim is to build more accurate topic directories—also known as taxonomies—so that users are more successful at finding the information they need.
Content categorization software, though only a few years old, is becoming very popular in government as agencies try to tap the treasure-trove of information locked in a variety of computer files, from memos and Web pages to spreadsheets and presentations.
"In the last six to nine months, these products have become a critical component to many knowledge management and portal initiatives," said David Yockelson, senior vice president and director of the electronic business strategies group at META Group Inc. "Regular search engines only go so far at helping people find what they're looking for."
The main criticism of full-text search engines is that they waste users' time and hinder their efforts to find the right information by overwhelming them with lengthy lists of items that are only marginally relevant to their search terms.
Content categorization software, on the other hand, basically pre-searches a body of information then organizes summaries of and links to that content in a topic-based taxonomy that users can browse or search. The taxonomies are hierarchical, so a topic on "budgets" might have subtopics on "FY 2002 budgets" and "FY 2001 budgets," further helping users find the information they need.
When Quiver launched its debut product, QKS Classifier, in August, one of the company's main goals was to give users more input into the content classification process. Typically, other taxonomy products fully automate this process, leaving it up to the software to decide under which topic a particular document should go.
"The automated approach is very efficient and scalable," said Roz Chapman, senior director of corporate marketing at Quiver, "but the industry average for accuracy is about 60 to 70 percent, meaning that up to 40 percent of documents are placed in the wrong category."
QKS Classifier enables a system administrator to configure the auto- classification engine to flag new documents that fall below a certain preset relevancy score. Then, instead of automatically publishing the document in the taxonomy, the software's workflow tool brings the document to the attention of the administrator or a content editor, who then decides under which category to place it.
At least one other vendor sees the value in letting people have more control over the classification process. Semio is readying the next version of its SemioTagger for release later this year, and it will include a feature called "Tag Taxonomy Suggesting," according to Roger Phillip, vice president of marketing and business development for Semio.
With the new integrated feature, SemioTagger will still suggest how documents should be categorized, but it will also give users the option of overriding those recommendations and categorizing the files as they see fit.
Semio has several customers in the government market. One of those, the Postal Service, uses SemioTagger to make vast collections of marketing and customer data more easily accessible to about 1,000 USPS employees.
John Gregory, a marketing specialist at the Postal Service's office of market intelligence and segmentation, said getting more control over the categorization process will be a welcome improvement.
"Semio, up to this point, has been clumsy about doing that," he said. "You had to leave the main application and get into a separate program, then manually query the system to find those documents" that might not be good fits for certain categories.