Agencies grapple with search and discovery

As increasing amounts of information exist only in electronic form, agencies are having to find ways to search it, retrieve useful information and retain it to comply with information retention rules.


Legal cases now routinely require agencies to produce not only paper records but also e-mails, text chat logs and other electronic data — a challenge many of them are not well prepared for, said Ed Meagher, deputy chief information officer at the Interior Department.


"This is one of those issues that has really crept up on us," he said. Meagher moderated a panel discussion on the topic today at the FOSE trade show in Washington.


"It's relatively easy to store 1 billion objects, but it is incredibly hard to search for relevant information" within them, said Jason Baron, director of litigation at the National Archives and Records Administration.


Lt. Col. James Whitlock, chief of knowledge management for the Air Force Medical Service, said enterprise users are conditioned by Internet search tools — primarily Google — to expect well-sorted search results. Google's site ranking system is so good that 90 percent of the time, users find what they need on the first page of results, he said.


"We are socialized to expect that level [of accuracy] when we go to enterprise search, and the problem is, the Google magic doesn't work" for enterprise data, Whitlock said. Google's search engine ranks Web pages based on the number of other pages that link to them. There is no such easy measure for business documents in an enterprise system, he said.


A Google spokesman, who was not involved in the discussion, later noted that Google does offer enterprise search tools using technology approriate to the enterprise.


Whitlock advocated a ranking technique called "concept search." This technique breaks down multiple-word search terms into smaller units and ranks document hits accordingly. For example, an enterprise search on "tamiflu stockpile policy" would rank documents containing that complete phrase at the top of the list of possibly relevant hits. Next would be documents containing the phrases "tamiflu stockpile" or "stockpile policy," and trailing those, documents with any one of the words.


There are other sorting and ranking process that can be used in conjunction with concept search-based tools to further refine the results, he said.


The situation will only grow more complicated, Baron said. To date, most of the attention to electronically stored information has centered on e-mail, text chat logs and similar common tools. But it can also include voice mail, electronic calendars, instant messages, video conferences, posts to wikis and blogs, and virtual worlds such as Second LIfe, he said.


And that's not even counting new technologies yet to emerge.


 

About the Author

Technology journalist Michael Hardy is a former FCW editor.

Featured

  • Contracting
    8 prototypes of the border walls as tweeted by CBP San Diego

    DHS contractors face protests – on the streets

    Tech companies are facing protests internally from workers and externally from activists about doing for government amid controversial policies like "zero tolerance" for illegal immigration.

  • Workforce
    By Mark Van Scyoc Royalty-free stock photo ID: 285175268

    At OPM, Weichert pushes direct hire, pay agent changes

    Margaret Weichert, now acting director of the Office of Personnel Management, is clearing agencies to make direct hires in IT, cyber and other tech fields and is changing pay for specialized occupations.

  • Cloud
    Shutterstock ID ID: 222190471 By wk1003mike

    IBM protests JEDI cloud deal

    As the deadline to submit bids on the Pentagon's $10 billion, 10-year warfighter cloud deal draws near, IBM announced a legal protest.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.