Search gets smarter

Citizens, federal employees, information technology administrators and agency executives all share a common need to locate information as quickly and efficiently as possible. Yet search technologies often produce a myriad of results and frequently few, if any, are the precise information people are seeking.

Agencies can and should inject efficient search technologies at the appropriate points across the enterprise. Obviously, there are solutions to simplify Internet-based searching. Moreover, you may want to consider a separate, role-based solution to support agency searching across an intranet. Finally, if your employees are heavy searchers, you may want to implement client-side tools that enable efficient crawling of multiple search engines simultaneously.

We recently surveyed some of the available search technologies to gauge progress. We find that search tools are improving, and we believe search will only grow more efficient. Keeping an eye on search tools progress will enable your agency to capitalize on ongoing improvements while keeping the budget in check.

USA.gov: Something for everyone

In January, a rebranded FirstGov emerged. Now called USA.gov (www.usa.gov), this Web-based search portal has several unique features that are well-suited for a variety of audiences.

From the starting page, a tabbed presentation provides easy access for the public, businesses and agency employees. New in the latest incarnation of this search portal is the ability to chat with a person in real time. Visitors to USA.gov can chat with government employees Monday through Friday from noon to 8 p.m. Eastern Time.

Underlying the USA.gov site are two complementary search-related technologies. The first is Vivisimo (www.vivisimo.com), a search technology that produces clustered results in a manner that makes it easier to pinpoint accurate information. The second is MSN search (www.msn.com).

Vivisimo, in particular, offers a useful differentiation when compared to other available tools. When searching on USA.gov, we found that search results were placed in clustered folders, which were then accessible by topical area, agency or source. The clustered approach made it much easier to locate relevant information.

The USA.gov portal — via Vivisimo and MSN — also offers the ability to return results found in a variety of sources, including frequently asked questions forms, audio materials, office documents and PDFs.

Google: Specialized search for government
Although Google (www.google.com) may often be considered synonymous with Internet searches, the company offers other solutions that may be well-suited for federal agencies. In particular, Google offers a specialized U.S. Government Search (www.google.com/unclesam).

Google’s government search includes .gov and .mil domains, and select sites that are relevant and fall within .com, .us, or .edu domain types. When searching Google’s government site, we found that if we just entered the search term and pressed Enter, the search engine returned only Web-specific content.

After entering the search term and clicking on the Search Government Sites button, we were able to obtain government-specific search results. Although Google U.S. Government Search does not provide federal, state and local domains as granularly as USA.gov does, it is possible to use the Advanced Search feature to define which domains are culled for results.

Aside from searching capabilities, Google enables users to customize their interface through the use of a Google log-in. Once logged in, users can customize their interfaces by using the content directory to populate their pages with RSS feeds from various sites.

For users seeking search capabilities within agency walls, a Google hardware appliance may be a good option to consider. The company offers two models, the Google Search Appliance and the Google Mini. Both can search through content and support more than 220 different formats. The former can scale search support to more than 500,000 documents while the latter sports capabilities to search 100,000 to 300,000 documents.

If your agency uses geospatial tools, you might want to consider using Google’s Earth Enterprise and Maps for Enterprise. Earth Enterprise uses your own images or Google’s satellite imagery and can be scripted into a service or application, while Maps for Enterprise can be used to create detailed mapping applications.

Finally, Google offers the typical search toolbar for agency employees’ Web browsers and Google Desktop for Enterprise. The latter could be more useful than you think because documents often remain on desktops and instead of making their way onto the appropriate server. Cataloging the contents on desktops will help preserve agency assets. Agencies that are interested in exploring other desktop search options will want to examine Beagle (beagle-project.org).

RetrievalWare: Server-side search and retrieval
Convera is also addressing search and retrieval on the server side inside the enterprise or agency walls.  The company’s RetrievalWare solution is geared to reducing the cost and time it takes to locate accurate information within the enterprise.

You could think of RetrievalWare as an intra-enterprise, metasearch tool because it can go across multiple types of file systems, portals and various repositories to locate needed information by an agency. Some examples of these include Red Hat Enterprise Linux AS 4, Oracle 10g and IBM’s WebSphere Application Server.

RetrievalWare can support a variety of indexing methods, including distributed, parallelized indexing for large document collections. The Convera solution also supports content filtering and concept and entity extraction regardless of the content. Moreover, this solution can address structured, unstructured and semistructured data types.

If your agency is information-intensive, RetrievalWare is worth considering because it offers automatic and dynamic classification support, useful administration features such as index alerting, and access control support that can tie into your existing security implementation.

Copernic: Powerful, metasearch tool

Copernic Technologies is also looking to make search and retrieval more efficient. It offers a variety of tools that are most helpful to users. Like Google Desktop and the Beagle project, Copernic offers an indexing function that can tap data stored on local or network drives.

However, in addition to indexing  Copernic provides a powerful, metasearch facility that can retrieve information from many search engines concurrently in
response to a simple or advanced user query. 

For example, we used Copernic’s government related engines and executed several searches. As the search progressed, we could see it traversing all of the engines, and the results of the search were saved to a folder. 

Equally useful, Copernic analyzed the results it found before letting us work with the results. The analysis included strict link checking for all of the results, which saved us time by eliminating invalid or inactive results.

Two other Copernic features — tracking and summarizing — will also pique the interest of agency employees. The tracking component automatically monitors Web pages and detects any content changes or updates made to the pages. 
Copernic sends an e-mail message to advise the user of the content change. The search engine also highlights the content that has changed so users keeping tabs on longer documents can significantly reduce their read/update times.

Summarization technology uses statistics and various algorithms to detect the key points within a document. This function then extracts the relevant material to create a condensed version of the original document with just the critical items included.

Memex: An intelligent engine
Another specialized search facility is available from Memex (www.memex.com), and it specifically targets the intelligence communities and law enforcement. The Memex Intelligence Engine provides facilities that support highly accurate searches on huge amounts of structured and unstructured data in seconds.
Data can be located even if it is entered into an incorrect field in a relational database or if it is buried inside a large PDF. Memex can also identify locations, proper names and relationships.

In particular, Memex provides facilities, such as a query builder, so analysts don’t have to be query experts to be productive with the solution. Likewise, Memex is geared for efficiency with index updates committed in real time and data compression that the company says compresses data by about 60 percent compared with its original size before storage in Memex.

Users can secure data stored in Memex at the field level, if necessary. Moreover, the Memex solution includes advanced searching methods, such as sounds like, range and keyword searches.

The clusters have it

Aside from USA.gov, there are some other general search engines that employ clustering or visualization techniques to improve the precision and efficiency of search results. Northern Light (www.northernlight.com), a pioneer in the field of clustered search engine results, also has an enterprise search engine that can be customized to suit most agencies.

Other search engines, such as Clusty (www.clusty.com) and Mooter (www.mooter.com) provide clustered results output that can be refined to yield highly accurate results. A similar metasearch tool, Kartoo (www.kartoo.com) submits user queries to multiple search engines and reports results in a highly visual, mapped form. Users can move from one visual map to the next to refine results.

Activating search smarts
Search engine technology is once again undergoing a new round of metamorphosis, but that should hardly be surprising.

The technology underlying search engines still has its roots in the information retrieval field, which dates back more than 50 years.

In a 1966 Scientific American article, author Ben Ami Lipetz concluded that information retrieval would not evolve as a technology until researchers understood the various ways that humans process information. We are only now beginning to narrow our focus to gain that deeper understanding of that process.

Even with all the content already available to search engines today, there are by some accounts more than 500 times as much information that has yet to surface because of the current limitations of available search engine technology, as compared to types of content. Nevertheless, progress is under way.

Some personal digital assistants and cell phones can now provide real-time tools for location-based information searches. Search histories and Web browsing behavior captures are also helping search providers to support more refined searching capabilities.

In the future, search technology and data mining will become more closely melded. Together with advances in user interface design, that will yield the next leap forward in information exposure — like it or not.

Agencies that keep an eye on the advancing field of search technology should be able to adopt them early and often enough to gain a competitive advantage.

Biggs is a senior engineer and freelance writer based in Northern California.


X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.