Agencies grapple with search and discovery

As increasing amounts of information exist only in electronic form, agencies are having to find ways to search it, retrieve useful information and retain it to comply with information retention rules.

Legal cases now routinely require agencies to produce not only paper records but also e-mails, text chat logs and other electronic data — a challenge many of them are not well prepared for, said Ed Meagher, deputy chief information officer at the Interior Department.

"This is one of those issues that has really crept up on us," he said. Meagher moderated a panel discussion on the topic today at the FOSE trade show in Washington.

"It's relatively easy to store 1 billion objects, but it is incredibly hard to search for relevant information" within them, said Jason Baron, director of litigation at the National Archives and Records Administration.

Lt. Col. James Whitlock, chief of knowledge management for the Air Force Medical Service, said enterprise users are conditioned by Internet search tools — primarily Google — to expect well-sorted search results. Google's site ranking system is so good that 90 percent of the time, users find what they need on the first page of results, he said.

"We are socialized to expect that level [of accuracy] when we go to enterprise search, and the problem is, the Google magic doesn't work" for enterprise data, Whitlock said. Google's search engine ranks Web pages based on the number of other pages that link to them. There is no such easy measure for business documents in an enterprise system, he said.

A Google spokesman, who was not involved in the discussion, later noted that Google does offer enterprise search tools using technology approriate to the enterprise.

Whitlock advocated a ranking technique called "concept search." This technique breaks down multiple-word search terms into smaller units and ranks document hits accordingly. For example, an enterprise search on "tamiflu stockpile policy" would rank documents containing that complete phrase at the top of the list of possibly relevant hits. Next would be documents containing the phrases "tamiflu stockpile" or "stockpile policy," and trailing those, documents with any one of the words.

There are other sorting and ranking process that can be used in conjunction with concept search-based tools to further refine the results, he said.

The situation will only grow more complicated, Baron said. To date, most of the attention to electronically stored information has centered on e-mail, text chat logs and similar common tools. But it can also include voice mail, electronic calendars, instant messages, video conferences, posts to wikis and blogs, and virtual worlds such as Second LIfe, he said.

And that's not even counting new technologies yet to emerge.


About the Author

Technology journalist Michael Hardy is a former FCW editor.


  • Defense
    Soldiers from the Old Guard test the second iteration of the Integrated Visual Augmentation System (IVAS) capability set during an exercise at Fort Belvoir, VA in Fall 2019. Photo by Courtney Bacon

    IVAS and the future of defense acquisition

    The Army’s Integrated Visual Augmentation System has been in the works for years, but the potentially multibillion deal could mark a paradigm shift in how the Defense Department buys and leverages technology.

  • Cybersecurity
    Deputy Secretary of Homeland Security Alejandro Mayorkas  (U.S. Coast Guard photo by Petty Officer 3rd Class Lora Ratliff)

    Mayorkas announces cyber 'sprints' on ransomware, ICS, workforce

    The Homeland Security secretary announced a series of focused efforts to address issues around ransomware, critical infrastructure and the agency's workforce that will all be launched in the coming weeks.

Stay Connected