Speech recognition: The talk of the town
- By Brian Robinson
- Sep 05, 1999
Speech recognition has had a tough decade. With such popular images as Star Trek's chatty computers to compete against, the realities of speech recognition - long system training time, the need for discrete word input and high error rates - have limited the market to early adopters. Although speech recognition may be the best means for some disabled people to interact with the world, mainstream users have been hard to find.
But that may be about to change. Major improvements in the technology itself, together with incredible leaps in computer processing power, have pushed speech recognition to the brink of the mainstream.
After the Year 2000 threat passes, the speech-recognition market could explode, vendors and analysts say. IBM Corp., which has had conducted one of the longest research efforts in the area, believes as much as a third of the United States work force could be using some form of speech-recognition technology within the next three years.
"Most people realize that speech recognition will be an important interface technology at some point," said William Meisel, president of TMA Associates Inc., Tartan, Calif., a marketing and analyst firm specializing in speech technologies.
The most common use of speech recognition for most people is in speech-to-text computer dictation, Meisel said. The biggest market impediment with this application is not with the technology itself - which is improving - but that, with all the problems encountered in the past, "people are just not comfortable with [the technology] yet," he said.
"Millions of copies of these dictation systems have been sold," said Stuart Patterson, president and chief executive officer of SpeechWorks International Inc., Boston."But it's not been a home run, by any means. Many people have tried [the technology], but many also found it laborious to use. So a lot of these copies were simply put on a shelf."
Development efforts finally started to show results several years ago in two areas. First, the technology has grown to the point where users no longer have to speak individual, discrete words to be understood and instead can talk in a natural and continuous flow. Also, the products can understand anyone, where before they needed to be trained for use by a particular speaker.
Overall, the accuracy of speech-recognition systems has improved dramatically, moving beyond what many industry observers see as a critical 93 percent threshold to a current level of 95 percent to 97 percent. Along with that, processing power has been pushed to Intel Corp. Pentium II and Pentium III levels, with little if no increase in the overall price of computer systems.
"When we did our first demonstration of speech recognition in 1994, it took two large rooms of computer equipment just to be able to understand one sentence," said David Nahamoo, senior manager for the Human Language Technology Department at IBM Research. "Now you can get much better recognition and much better accuracy using a single ThinkPad portable computer."
All of these advances together "have absolutely changed the picture" for speech recognition, he said.
In government, the military has led the way. The Defense Advanced Research Projects Agency has promoted research in speech recognition for more than 15 years. The DARPA Communicator program, a successor to the agency's Human Language Technology Program, aims to take speech-recognition technology from speech-only interfaces to ones that combine speech with graphics, maps, and pointing and other gestures.
Current military applications of speech recognition span a wide range, from dictation to using voice commands to control unmanned aerial vehicles. Army pilots, for example, use wearable computers built with a speech-to-text engine from Lernout & Hauspie Speech Products USA Inc., Burlington, Mass., to control UAVs by speaking into the device while monitoring UAVs' motions on a flat-screen display.
On the civilian side, speech recognition has been limited to such things as PC text and data input by disabled workers using products such as Newton, Mass.-based Dragon Systems Inc.'s DragonDictate and NaturallySpeaking or IBM's ViaVoice. However, even at this limited level, there is enough of a demand to convince agency officials that the technology could have a bright future in government.
In the Education Department, for example, only about 40 people use speech recognition, according to Alex Coudry, a computer specialist working with the department's small assistive technology team. But the team adds about two or three new customers a month, he said.
Newer versions of products optimized for the Pentium III chip have cut initial setup times from 30 minutes or more down to about 10 minutes, and the speech recognition itself is greatly improved, Coudry said.
Products have not reached the point where a user can just open the box and start using them. But once persistent users get used to a product's commands, "they love it," Coudry said.
The Internal Revenue Service said its base of speech-recognition users also is in the dozens, but it too is seeing an increased demand. As the technology improves, the IRS is trying to expand its range of applications.
"Historically, it's been used with [discreet speech] products, and they did work well for some people, particularly those who couldn't use a keyboard," said TJ Cannady, program manager for the IRS' Information Resources Accessibility Program Office. "Now, with continuous speech recognition, we are looking to make it available to those who can still use the keyboard to enter some commands but who also need something such as speech recognition to give them the ability to do extensive data entry."
Telephony is considered the next big application area for speech technology. It's been used to some extent already, beginning a decade ago as a way to provide rotary phone users with the same access to services as touch-tone users, and for menu-based voice services.
Recent advances in the technology will make it possible in the near future for a someone to interact with a phone-based system using natural dialogue. They will, for example, be able to conduct hands-free direct dialing. Instead of having to listen to a menu of names before getting to the final contact, users simply will say the name of the person they want to contact, along with location information, and the system will connect them automatically.
"It's the killer application for telephony," said IBM's Nahamoo. "It will take us back to the old time of telephony, when you had to go to a central office and ask for a particular person instead of dialing a number. It opens up the door for speech recognition in telephony."
Bell Atlantic has built such an application for the internal use of its 21,000 employees, "and it's used constantly," said Alex McAllister, manager of technology development for Bell Atlantic Federal Integrated Systems. "It's a very efficient way to connect with individuals. It's an idea that's been on the table for some time, but the development of speaker-independent speech technology has really moved it forward."
Some companies also believe that speech recognition could have a major impact in moving the Internet closer to the large number of people who are not on the Internet or do not have browsers on their PCs that give them World Wide Web access. The technology would be used to build a speech-enabled front end that would provide a way for users to navigate Web sites similar to the way PC users can with a mouse or keyboard.
"In the U.S., there are still some 10 times as many phones as there are Web browsers," said Joe Yaworski, vice president and general manager for Unisys Corp.'s Natural Language Business Initiative. "That means many more people have access to a phone than they do to a Web browser. With all the investment there has been in the Web, there is a great deal of interest in being able to open up Web-type transactions to this [non-Web] public."
That's not to say that speech recognition, even in these more evolved forms, comes without caveats. The Census Bureau, for example, is investigating how to use speech recognition to provide Census maps that are more accurate and able to be produced more quickly than is possible now. Before a census takes place, bureau employees have to walk through every block of major metropolitan areas to ensure that addresses are correct and that the maps census enumerators use can be verified as accurate.
The bureau is experimenting with a system that marries a laptop-based Global Positioning System, which ensures the exact coordinates of a location, with speech recognition to enable an employee to input information into the system verbally and have the system speak back to the employee to verify the information before it is committed into the database permanently.
The problem with this is not in the speech-recognition technology itself, according to Bill Laplant, a computer scientist in the bureau's statistical research division, but with the way the human mind works.
"With a keyboard, people can simultaneously read and edit because they are using two distinct modes of thought," Laplant said. "With speech, there is the possibility of interference with the process of checking to see that the data has been put into the system correctly."
Robinson is a free-lance journalist based in Portland, Ore. He can be reached at [email protected]
At a Glance
After years on the periphery, voice-recognition technology appears poised to take on mainstream applications. Software advances and increases in underlying processing power have boosted overall accuracy and made the products easier to use.
The technology's poor reputation over the years remains its largest obstacle, particularly for applications using voice-recognition technology to enter data into a personal computer. Additionally, some users may find it awkward to enter data by speaking, rather than using a keyboard.
Very good. In addition to the growing demand for assistive technology, advances in voice-recognition software have paved the way for a wealth of new applications in such areas as telephony and cartography.