A new life for talk-to-text?
- By John Moore
- Jun 20, 2008
Developers of speech recognition technology seem to have taken to heart the adage that the biggest room in the world is the room for improvement.
Developers have tried for decades to improve speech-recognition technology. The military funded its early development in the 1970s and continues to drive innovation today. Current research activities focus on enhancing the technology’s accuracy with foreign languages and improving its ability to work in loud environments.
Meanwhile, continuing improvements in commercial speech recognition could help overcome the technology’s reputation as an over-hyped underachiever. PC-based speech-recognition products already enjoy a presence among people for whom keyboard input is difficult or impossible and in specialized areas such as medical transcription. Higher accuracy, coupled with a gentler learning curve, appear to be winning over customers in other areas.
Perhaps the greatest advances could be made using speech recognition software on smart phones and personal digital assistants. People who now struggle with small keypads might be open to using voice input for text-messaging and Web browsing.
“The real need is in mobile phones,” said Bill Meisel, president of TMA Associates, a consulting firm and newsletter publisher that focuses on speech recognition. “That is where people are going to be the most motivated…to use speech.”
Experts say that if speech recognition becomes second nature to millions of cell phone and PDA users, the hands-free habit could spill over into general desktop and laptop PC use.
The Defense Advanced Research Projects Agency, a longtime backer of speech recognition research, is focused on processing foreign-language speech and text. In 2005 the agency launched the Global Autonomous Language Exploitation (GALE) project with a goal of distilling foreign-language radio and TV newscasts into what DARPA describes as actionable information for military commanders and personnel.
DARPA tapped BBN Technologies, IBM and SRI International to develop systems capable of transcribing broadcasts into text and translating it into English text. They began with Arabic and Chinese. The companies deliver the technology in stages as they strive to hit accuracy targets.
“Targets for the ultimate goal are 95 percent translation accuracy for 90 percent of show segments,” said Joseph Olive, DARPA’s GALE program manager.
Another military application of speech recognition involves the F-35 Joint Strike Fighter, which has a speech recognition system that enables a pilot to control various subsystems through voice commands. That system is based on SRI’s DynaSpeak speech-recognition software.
Such military applications must deal with high noise levels, which can limit the usefulness of speech recognition. “The main challenge for speech recognition in a military environment is ambient noise,” said Kevin Bobsein, a computer engineer at the Army’s Communications-Electronics Research, Development and Engineering Center (CERDEC). “Vehicles, gunshots, loudspeakers — any noise that you might encounter on a battlefield presents a problem for speech recognition.”
CERDEC’s Command and Control Directorate operates a Machine Translation Audio Testbed at Fort Monmouth, N.J., to evaluate the impact of noise on speech-recognition and language-translation systems.
SRI also is pursuing ways to distinguish speech from background noise. Martin Graciarena, a research engineer at SRI’s Speech Technology and Research Laboratory, said the problem has two dimensions: speech detection and speech robustness.
The former involves detecting when someone starts to speak in a noisy area. Detection is especially important when users can’t readily push a button on a microphone to trigger speech recognition. The latter aspect of the problem — speech robustness — involves recognizing a foreground speaker amid background noise.
SRI uses various techniques to deal with noise. For example, it creates statistical acoustic models that represent various types of noise and speech, which help in distinguishing foreground speech from background noise, Graciarena said.
Kristin Precoda, director of SRI’s Speech Technology and Research Laboratory, said the company also takes into account distinctive noises in the customer’s environment. “Any particular task has certain characteristic kinds of background noise,” she said. For example, speech recognition in a vehicle will be affected by wind and traffic sounds.
Developers face other challenges in addition to coping with noisy environments, said Premkumar Natarajan, vice president and lead scientist of speech and language technologies at BBN. Natarajan said developers struggle with variability in dialects and discursiveness, or the tendency of speakers to change topics rapidly.
Various improvements in speech-recognition products have begun to broaden the technology’s appeal beyond military and other traditional uses. The Florida Department of Children and Families recently purchased 1,600 licenses for Nuance Communications’ Dragon NaturallySpeaking speech-recognition software. Investigators in the department will use the product to create field case reports, said Chris Pantaleone, the department’s chief information officer.
The Florida agency piloted Dragon NaturallySpeaking with a group of workers and found that the software responded well to various accents, Pantaleone said. People who worked with the software were satisfied with its dependability, and training individual investigators to use the system took no more than 45 minutes, he said.
In contrast, early speech-recognition systems involved a lengthy enrollment process as users trained the system to recognize their voices. But with today’s speaker-independent technology, systems “can figure out [what] the voice is like on the fly and give good accuracy out of the box,” said Peter Mahoney, vice president and general manger of Nuance’s Dragon business.
As barriers to productive uses of speech recognition recede, more organizations are adopting the technology for general office productivity, Mahoney said. Office productivity is the company’s fastest-growing market sector, he added.
However, old habits die hard. The Florida Department of Children and Families conducted a field survey to determine how many staff members would use a speech-recognition system. Pantaleone said 75 percent of the investigators were open to using the technology, but 25 percent were more comfortable typing reports.
People accustomed to entering text and composing messages or documents via keyboard present a hurdle for speech recognition, Meisel said. “We have been trained with word processing to edit as we go, and that is not as convenient with speech recognition.”
Meanwhile, the new focus of speech recognition on handheld devices could help people become accustomed to the new experience, Meisel said. Speech recognition is a natural fit for mobile devices where text input via keypad can prove frustrating, he said, noting that people often must double- or triple-tap a key to type a desired letter.
Ashwin Rao, chief executive officer at TravellingWave, said mobile devices have “the largest pain factor” when it comes to entering information. TravellingWave targets text messaging, e-mail and Internet browsing as the primary applications for its speech-recognition technology.
Speech recognition is increasingly deployed in contact centers, some of which use natural-language call routing to handle open-ended customer requests, Meisel said. That application, coupled with use on handheld devices, is contributing to a broader acceptance of speech recognition, h said.