Agencies grapple with search and discovery

As increasing amounts of information exist only in electronic form, agencies are having to find ways to search it, retrieve useful information and retain it to comply with information retention rules.


Legal cases now routinely require agencies to produce not only paper records but also e-mails, text chat logs and other electronic data — a challenge many of them are not well prepared for, said Ed Meagher, deputy chief information officer at the Interior Department.


"This is one of those issues that has really crept up on us," he said. Meagher moderated a panel discussion on the topic today at the FOSE trade show in Washington.


"It's relatively easy to store 1 billion objects, but it is incredibly hard to search for relevant information" within them, said Jason Baron, director of litigation at the National Archives and Records Administration.


Lt. Col. James Whitlock, chief of knowledge management for the Air Force Medical Service, said enterprise users are conditioned by Internet search tools — primarily Google — to expect well-sorted search results. Google's site ranking system is so good that 90 percent of the time, users find what they need on the first page of results, he said.


"We are socialized to expect that level [of accuracy] when we go to enterprise search, and the problem is, the Google magic doesn't work" for enterprise data, Whitlock said. Google's search engine ranks Web pages based on the number of other pages that link to them. There is no such easy measure for business documents in an enterprise system, he said.


A Google spokesman, who was not involved in the discussion, later noted that Google does offer enterprise search tools using technology approriate to the enterprise.


Whitlock advocated a ranking technique called "concept search." This technique breaks down multiple-word search terms into smaller units and ranks document hits accordingly. For example, an enterprise search on "tamiflu stockpile policy" would rank documents containing that complete phrase at the top of the list of possibly relevant hits. Next would be documents containing the phrases "tamiflu stockpile" or "stockpile policy," and trailing those, documents with any one of the words.


There are other sorting and ranking process that can be used in conjunction with concept search-based tools to further refine the results, he said.


The situation will only grow more complicated, Baron said. To date, most of the attention to electronically stored information has centered on e-mail, text chat logs and similar common tools. But it can also include voice mail, electronic calendars, instant messages, video conferences, posts to wikis and blogs, and virtual worlds such as Second LIfe, he said.


And that's not even counting new technologies yet to emerge.


 

About the Author

Technology journalist Michael Hardy is a former FCW editor.

Featured

  • Cybersecurity

    DHS floats 'collective defense' model for cybersecurity

    Homeland Security Secretary Kirstjen Nielsen wants her department to have a more direct role in defending the private sector and critical infrastructure entities from cyberthreats.

  • Defense
    Defense Secretary James Mattis testifies at an April 12 hearing of the House Armed Services Committee.

    Mattis: Cloud deal not tailored for Amazon

    On Capitol Hill, Defense Secretary Jim Mattis sought to quell "rumors" that the Pentagon's planned single-award cloud acquisition was designed with Amazon Web Services in mind.

  • Census
    shutterstock image

    2020 Census to include citizenship question

    The Department of Commerce is breaking with recent practice and restoring a question about respondent citizenship last used in 1950, despite being urged not to by former Census directors and outside experts.

Stay Connected

FCW Update

Sign up for our newsletter.

I agree to this site's Privacy Policy.