Voice-recognition technology: Waiting to exhale
- By Charlotte Adams
- Dec 14, 1997
From systems a few years ago that seemed more annoying than helpful, voice-recognition products are now poised for mainstream government users according to industry observers.
Voice-recognition technology converts human speech into a digital code that a computer can understand. Such code makes possible a variety of applications -- from dictating text into a word processing document to verbally controlling a computer's functions.
Of course today's systems are far from perfect, observers concede. One cannot say just anything to even the most advanced software but instead must stay within the context of the application's vocabulary. Even the most polished dictation software works best if "trained" by the user.
But despite these limitations, people working with the technology believe voice recognition is close to coming of age.
BBN Systems & Technologies is working with the Defense Advanced Research Projects Agency (DARPA) to perform automatic voice-to-text transcriptions of 30-minute radio and TV news broadcasts, said John Makhoul, a chief scientist with BBN in Cambridge Mass.
In the last 10 years, researchers have advanced by "roughly an order of magnitude," Makhoul said. He expects software error rates to decline to 10 to 15 percent in another few years.
And "if funding continues in the next 10 years, we would approach human performance" in "simple" voice-recognition applications he said.
Finding Its Voice
The industry is still at least several years away from full integration of voice recognition into applications.
Only recently for example has continuous speech recognition for dictation become available, said Jackie Senn vice president and research director for advanced technology at the Gartner Group, Burlington, Mass.
Still, John Oberteuffer, president of Voice Information Associates, a market research firm in Lexington Mass., predicts sales exceeding $3 billion in this niche by 2001 with keyboards becoming obsolete in 10 years.
In 1990, Dragon Systems Inc. software that recognized discrete words cost $10,000 whereas continuous speech recognition is available now for less than $100, Oberteuffer said.
For example, IBM Corp.'s ViaVoice Gold dictation kit retails for $149, including a noise-canceling headset microphone. The IBM software runs on Microsoft Corp.'s Windows 95 and NT, requiring a sound card and a minimum of 125M of hard drive space, 32M of memory and a Pentium 166 MHz or 150 MHz with MMX technology.
Not only is the newer voice-recognition software getting less expensive, but it demands a smaller portion of the platform's processing power, vendors said.
Desktop software today can require less than 25 percent of the power of a 120 MHz Pentium system, according to Erik Tarkianinen, the director of product marketing for software developer Lernout & Hauspie (L&H) in Boston. Some software "can run on a $6 [digital-signal processor]," he said.
Typically marketed to application developers, L&H's software includes speech recognition text-to-speech and dictation. L&H has won a $1.9 million Advanced Technology Program grant from the Commerce Department to develop an application that funnels medical data from a continuous voice-recognition system into a "structured medical reporting system," the company said. It is working with Columbia University in New York on the project.
Although research sponsored by the Defense Department promises more intelligent speech technology, government applications still are relatively rare. However, agencies across government are looking at prototypes.
The most dramatic role so far is a voice-control technology that was developed by Smiths Industries for the multiple-role Eurofighter slated for production in 2001. Prototypes have been tested in the F-16, F-18 and the AV-8B/Harrier fighter aircraft.
Even this, however, is limited to about 25 non-flight-critical cockpit chores, such as selecting radio frequencies and displays.
Meanwhile, Sentel Corp., Alexandria Va., has worked with NASA to develop a voice-activated system for processing space shuttle payloads using Dragon's DragonDictate, according to Kevin Jackson, Sentel's chief technology officer.
Sentel has developed a prototype electronic-documentation- and procedures-verification application for quality-assurance people who would be monitoring payload processing for the space shuttle and space station programs. The software relies on a wireless network through which payload workers communicate their activity to the monitor, who can synchronize the requirements of the different payloads.
The system already has been tested on the space shuttle.
But Sentel is taking the application a step further, exploring voice recognition as an internal research project, Jackson said. The quality-control person on the floor would "use voice commands to go through documents reference materials and procedures" to allow hands-free activity, he said.
Crossing the Border
The Immigration and Naturalization Service, meanwhile, has equipped a border-crossing site in Scoby, Mont., with voice recognition, and it is adding the technology to its busy Otay Mesa, Calif., site, said Bob Mocny, the assistant chief inspector with INS' Office of Inspection in Washington, D.C.
The idea is to speed the flow of cross-border traffic while meeting the agency's regulatory obligations.
At Scoby, employees say a phrase, which has also been pre-recorded into a special telephone. When the system recognizes the speaker, it lifts the border-crossing barrier so the traveler can pass through.
At the California site, INS is now adding a mobile voice system, Mocny said. Words spoken into handheld devices by drivers can be matched with pre-recorded voice patterns at the time of border crossing. For test purposes, speakers will use the phrase, "Nothing to declare."
The Justice Department's grants program for improving municipal law enforcement, meanwhile, has agreed to have Detroit-based Voice Processing Plus Inc. (VPP) add a voice-recognition function to its Touch-Tone processing system for grant recipients, said Randy Stuck, president of VPP.
The pilot program - called the 269 Project after a form for grant recipients - involves explaining how grant money has been spent. With a vocabulary of fewer than 200 words, the software has a "fairly constrained" set of commands, but within these limits, it should achieve an error rate of less than 10 percent, Stuck said.
VPP used software from Parity Software Development Corp., L&H and Unisys Corp. The application uses "natural language" to the extent that it recognizes multiple inputs and makes decisions based on what it "understands," Stuck said. If it accepts a user's vocal inputs, that person will be allowed to receive funds in the following calendar quarter. If not, the software will "notify someone else and there's a follow-up," Stuck said.
A simpler voice-recognition system is being added to improve telephone service at DOD's Maryland Procurement Office, which runs a call center for military federal and state government customers about equipment problems. The office plans to use Lucent Technologies Inc.'s Conversant technology to help route through the center people who are calling from remote places in the world where they do not have Touch-Tone service, said Hoppy Harrell, the office's technical director for communications in the Baltimore area.
The technology will allow people to move through the call center's Touch-Tone-driven menus either by recognizing rotary dialing or spoken commands. Conversant Version 6 has a customized vocabulary of about 2 000 words.
The Federal Aviation Administration has been looking at two-way voice-maintenance applications for the Airways Facilities side of the house, which maintains radars and other equipment for air traffic controllers, said Lou Delemarre, a lead engineer at the FAA's Technical Center in Atlantic City, N.J.
A technician working in a tight area, such as an electronics cabinet, could use voice recognition to make calibration and alignment activities less tedious and more efficient. Right now, a technician must go into the cabinet with a meter take a reading, put the meter down, look up the next step, and take another reading. But with two-way voice technology, the technician could say, "Next step," and the system would read it to him.
In the air traffic control regime, there may also be room for voice recognition as a means of independent verification and validation, said Nancy Van Suetendael, an FAA project engineer. It could listen to the pilots and the controllers and alert the controllers to any miscommunications.
Voice of the Future
While civilian agencies experiment with prototype applications, DOD is pushing ahead with more advanced projects. Much Pentagon-sponsored research aims at simplifying man/machine interfaces.
SRI International, for example, provided voice input control technology for some of the simulated forces in the recent Synthetic Theater of War-97 demonstration, said Robert Moore, a principal scientist at SRI's AI Center in Menlo Park, Calif.
The ultimate goal of the research is to let wargame participants "say commands or information in a natural way with nothing to memorize," Moore said. This could eliminate the middle men who translate commands to a simulation allowing users to "interact directly [with it] through speech."
BBN's work with DARPA on the voice-to-text transcriptions of news broadcasts has produced an average error rate of 30 percent, but that includes not only news read by anchors but also interviews in the field by phone and with people who have foreign accents, Makhoul said.
Meanwhile, rough as it may be, the technology is already good enough for intelligence applications such as "subject identification" - scanning telephone conversations or radio broadcasts for topics of interest, Makhoul said.
BBN also has developed Speech on the Internet technology, which could give a soldier in the field a hands-free means of communicating with remote logistics or other databases.
Voice requests could be fed into a personal digital assistant, for example, which could compress the speech for transmission via radio or other medium at 4 kilobits/sec across the Internet. Recognition processing would occur remotely, and the user "would get an answer back right away," according to the company.
Along these lines, Lucent also has developed a prototype voice-activated World Wide Web browser. This could be used to have Web servers tell users what has been added to Web pages since the last time they've been viewed, said Fred Juang, head of Lucent's Acoustics and Speech Research Department, Murray Hills, N.J.
Lucent is also working on utterance-verification technology, which "gives the user more freedom in speaking and leads to higher accuracy," Juang said. "The inability to deal with sounds," such as ah, er, ooh, uh, or extraneous noise "causes more errors than other factors."
The "traditional idea of converting sounds into words" is of very limited use in communications, he said.
In an initiative called the Interactive Knowledge Environment Science Applications International Corp. is developing techniques such as verbal data mining and Web searches, said Rod Sheffield, the advanced research programs manager with SAIC in San Diego. Some of these tools could be folded into DARPA's Genoa crisis-management research program.
With Genoa, DARPA aims to build a set of tools that will help White House and agency officials quickly find the information they need to respond to a national or international crisis.
For Genoa, SAIC is providing the "crisis browse" function, which is a type of multilevel multimedia information-gathering process. The agency plans to add voice technology as early as next year, Sheffield said. "They're trying to build interactive briefings that can be funneled up through the chain of command," he said.
Indicative of the increasing maturity of speech recognition, SAIC, under a separate DARPA technology integration project, has developed "application wrappers" for a number of voice-recognition technologies. These wrappers will make it possible to add speech recognition to existing applications in a plug-and-play manner, Sheffield said.
Ultimately DOD would "want to embed speech recognition into every application " he said.
Adams is a free-lance writer based in Alexandria Va.
AT A GLANCE
Status: Voice-recognition technology still has limitations, but improved performance and lower cost have made it more attractive in such areas as voice control of computers.
Issues: Most applications allow only a limited vocabulary, while more advanced software still has problems with accuracy.
Outlook: Good. Industry observers say the technology is advancing quickly, aided both by private- and public-sector research.