Voice recognition: Sound technology
At long last, voice technology lives up to its billing
- By Patrick Marshall
- Jan 06, 2002
We've all seen one of the downsides of recent advances in voice recognition: Call your bank, broker or many companies and you're almost certainly in for five to 10 minutes of tedious dialogue with a disembodied voice before you're directed to the appropriate human employee. The new voice-recognition systems, although irritating to customers, save the companies that use them big bucks because fewer high-cost human receptionists are needed.
But the benefits of voice recognition are also growing clear. First, users with disabilities that make it difficult to type can use spoken commands to control computers and other devices. Voice-recognition software has been up to this sort of task for the past several years.
Even more impressive have been the recent improvements in dictation capabilities. Some voice-recognition software, once trained by a user, now achieves accuracy rates of 98 percent or better. As many a manager can attest, that puts the software in the same league with many typists in the steno pool.
Voice-recognition tools are having an impact in three major areas: call centers, voice-enabled devices and desktop dictation-and-command software. The general rule of thumb, said Nigel Beck, director of marketing for IBM Corp. Voice Systems, is that the more complex the speech, the more hardware required.
Call-center voice-recognition tools generally perform well these days because the speech they deal with is limited and because they can employ large centralized computers for crunching data.
"The call center 'cheats' its way through the vocabulary by knowing at each stage things you are likely to say," Beck said. "If, for example, the system asks you if you want investments or loans, it knows you're going to say only one of two possible things. It can therefore adapt to the fact that it doesn't know who is speaking or that you may be on a scratchy phone talking from the airport."
Voice-enabled devices, such as personal digital assistants, also are designed to work with a small vocabulary. "If you say, 'What are my appointments?' such devices will probably understand," Beck said. "But if you say, 'Hmmm. I wonder what my appointments are today,' they probably won't get it."
Thanks to their small vocabulary and the predictability of responses, voice-enabled devices can be quite effective even without user voice training — in which the user reads provided text aloud so that the program can get clues about how the user speaks — and without a large amount of processing power.
The toughest task voice-recognition systems are now attempting to handle is the one we are testing in this comparison: desktop dictation. With dictation, the software requires a large vocabulary because predictability isn't really a factor, and the system has relatively limited processing power at its disposal. Consequently, the key to good performance is voice training.
In the past, even voice-trained systems did not perform well. However, with the release of new versions of the two market leaders — IBM's ViaVoice and Lernout & Hauspie Holdings USA Inc.'s Dragon NaturallySpeaking — the technology has finally reached a stage where it is useful for a broad range of professionals. (ScanSoft Inc. recently acquired the Speech and Language Technologies business of Lernout & Hauspie.)
Alas, the last and greatest task for voice-recognition systems — on-the-fly recognition of user speech without training — is still beyond the reach of current technology. No matter how useful that capability might be for journalists and others who want to transcribe interviews or conferences, it is at least several years away.
"We have research systems right now that can do transcriptions of, say, a couple of speakers doing a newscast," Beck said. "But it does get set up with some clues. We'll tell it, for example, that there are only three speakers, and we'll give it some clues about how they speak." Beck believes the "Holy Grail" of transcribing natural speech without training will be attainable in less than 10 years.
For now, the two products we examined represent state-of-the-art professional dictation-and-command capabilities on desktop computers. We found both products to be highly capable, delivering a high level of accuracy once fully trained. You'll still need to edit most documents to catch occasional errors and to tune formatting, but either product will save you a lot of time in data entry, especially if you're a slow typist.
We also found that both programs are effective for navigating applications and issuing commands, such as Open File. Both programs even allow you to create macros for inserting blocks of text or performing complex operations.
Both programs, in fact, offer virtually identical feature sets, including optional specialized vocabularies for medical and legal offices. And both programs are available in various versions, offering different subsets of features at different prices.
So how can an information systems manager choose between the two solutions? What most distinguishes the two programs are their user interfaces. We strongly recommend that before making a purchase, you take the system for a test drive.
NaturallySpeaking: A Pricey Performer
This product's name is a good example of the benefits of an effective dictation program: Would you rather type it or speak it?
NaturallySpeaking can definitely handle its own name, and with a bit of training, we found it capable of performing extended dictation tasks with only occasional errors. NaturallySpeaking does require more training time than ViaVoice to achieve optimal recognition. Fortunately, the program comes with a large supply of interesting training files that the user reads aloud to perform the voice training.
Once the software is installed and trained, we found it easy to get right to work, thanks to NaturallySpeaking's intuitive interface. By default, the program installs a toolbar across the top of the screen, but you can easily relocate the bar or get rid of it, relying instead on a pop-up menu in the Windows System Tray. One click is all it takes to switch on the microphone and begin dictating in whatever application is currently active. You can also call up DragonPad, a stand-alone application for dictation.
Lernout & Hauspie touts the program's Nothing But Speech technology, which is supposed to filter out such sounds as "ums" and "ahs" during dictation, but we didn't find this feature to be particularly effective, and it certainly doesn't substitute for careful speaking.
Far more effective are Version 6's new modes for spelling, number, commands-only and dictation-only. These modes increase accuracy by allowing the program to focus only on the expected types of input. If you're working on a spreadsheet, for example, you can go into numbers mode, and because the program knows it will be hearing only numbers and related symbols, it can achieve a high level of accuracy.
Correcting recognition errors is easy but a bit awkward. The user highlights the word to correct, hits the minus key on the number pad to call up the correction pop-up windows and then uses the mouse to select the best option or to move the cursor to type a correction. It's possible to make corrections by voice only, but this requires more steps and will likely be of interest primarily to users with disabilities. Correcting errors in ViaVoice is much simpler.
Although no more accurate than Via.Voice, NaturallySpeaking does have a more intuitive interface, with better- organized menus and better-designed utilities. We especially liked the new Accuracy Center, which provides a single locale for all the program's tools for improving recognition accuracy. By calling up the Accuracy Center, you can quickly access voice-training files, add words to vocabulary files, analyze documents and more.
We also like the new tool that lets you add names from your Lotus Development Corp. Notes or Microsoft Corp. Outlook address books to vocabularies, and the tool that can scan e-mails for the same purpose. On the downside, we were disappointed to see that the program does not keep track of which training files you've already processed, so you may find yourself needlessly duplicating training.
In addition to listening to your voice, NaturallySpeaking has a voice of its own that can be used to read selected text aloud. NaturallySpeaking's voice is easier on the ear than the robot-like voice in ViaVoice. But NaturallySpeaking's female voice doesn't sound quite human — it sounds more like an android doing a pretty good imitation.
NaturallySpeaking offers a slightly more flexible macro utility than that found in ViaVoice. A macro can be used for a command as simple as calling up the File Open dialog box or as complex as copying numbers from a document, opening another application, such as a spreadsheet, and putting the numbers in appropriate form fields.
The new macro recorder makes it a snap to create multi.step navigation macros, making it possible to execute complex operations with a simple command. We liked that NaturallySpeaking allows you to include graphics — such as a bitmap signature — in a block text macro.
Finally, the program also includes a scripting tool compatible with Visual Basic for Applications (VBA). In-house developers can use it to customize the program or to integrate it with other applications.
ViaVoice: Ready to Deliver
In the past, IBM ViaVoice has been marketed primarily as a lower-cost alternative to NaturallySpeaking with fewer features. ViaVoice 9.0, however, closes that gap and offers comparable accuracy and a better price.
We were impressed with ViaVoice even before installing the software. When we opened the box, instead of a disposable headset, we found a brand-name product, Plantronics Inc.'s DSP-300 stereo headset. Even better, the device uses a USB port, which is available on most computers, so you don't have to plug it in to the back of your sound card, which saves a lot of trouble if you need to frequently plug and unplug the device.
We were also pleased with the voice-training utility. Not only were the six training files relatively entertaining — including excerpts from Mark Twain's writings and from "Treasure Island" — but the software also indicates how long it will take to perform each reading and tracks which files you have processed. Like Naturally.Speaking, ViaVoice can also scan your documents to add to its vocabulary, but ViaVoice can't perform NaturallySpeaking's trick of scanning e-mail messages and address books.
We found ViaVoice's interface to be a tad more difficult to learn than NaturallySpeaking's, but once learned, ViaVoice is easy to use.
We especially liked ViaVoice's utility for correcting mistakes. A right-click of the mouse on the background of your Microsoft Windows application or of SpeakPad — ViaVoice's bundled dictation application — allows you to open the Correction Window, which remains on screen until you remove it. Select a word in your dictation text, and it will automatically appear in the Correction Window. What's more, you can even have the Correction Window automatically pronounce the word as it is loaded. A list of alternative spellings is provided, and a click of the mouse selects one. If you're working within SpeakPad, all corrections will automatically be integrated into your user file to improve future recognition accuracy.
Like NaturallySpeaking, ViaVoice allows you to make corrections solely using voice commands, but we found it easier to use the mouse and keyboard.
Although both programs allow you to adjust recognition sensitivity, ViaVoice offers more granular controls. You can use a slider to go from "best guess" to "normal" to "exact match," and you can choose the speed of recognition. Most users will get the best results using the default settings, but in some situations — when using a specialized and limited vocabulary, for example, or working in a location with high background noise — you may want to manually change the settings.
ViaVoice also provides strong support for macros, including both multistep navigation macros and text block macros. NaturallySpeaking offers built-in VBA-compatible scripting for application development, but if you want to customize applications using ViaVoice features, you'll need to purchase the optional software developer's kit.
Other highlights of the program include its tight integration with Windows and especially with Microsoft Office, including Office XP. The program also is well integrated with Internet Explorer, Netscape Communications Corp.'s Messenger and Lotus Smart Suite Millennium 9.5.
ViaVoice's low price tag and improved accuracy make it a good choice, especially for those who don't need the stronger built-in scripting tools found in NaturallySpeaking.