Can you hear me now?
Improvements prep voice-recognition technology for a wider range of uses
- By Paul Ferrill
- Jul 11, 2005
The computer on the original "Star Trek" TV series set the bar awfully high for voice recognition. It not only understood human speech even slang but also replied clearly and with personality. Although earthbound voice-recognition technology is rapidly improving and is now useful for many office tasks, it has not yet attained the standard the starship Enterprise set.
Indeed, the technology needs a combination of vastly improved artificial intelligence technology and a more sophisticated speech-recognition engine before matching the performance of the USS Enterprise's system.
The good news is that the latest versions of Dragon NaturallySpeaking and IBM's ViaVoice do a pretty good job of figuring out what you're saying. As those products improve, they have a broader range of uses.
Transcription for the medical and legal fields continues to be one of the most frequent applications of voice-recognition technology. Enabling accessibility for users with disabilities runs a close second. Although most organizations that use a computer-assisted transcription process won't totally replace manual labor, they often use employees in quality assurance and editing roles rather than as professional transcribers.
For this review we looked at Dragon NaturallySpeaking Professional Version 8 and IBM ViaVoice Pro USB Edition Version 10. Both products come with a high-quality noise-canceling headset from Andrea Electronics, although they use different models. Dragon NaturallySpeaking comes with Model 91, while ViaVoice includes Model 61.
Both products offer similar speech-to-text capabilities, although the target market is obviously different. Dragon NaturallySpeaking comes with a number of features specifically for enterprises, including a feature that lets you store voice profiles on a central server and transcribe audio files from digital recorders or any handheld device that supports the Microsoft PocketPC operating system. ViaVoice focuses more on individual users, providing most of the same functions as Dragon NaturallySpeaking without the enterprise extras.
During the setup process, both packages require you to configure the software to match the hardware, such as headsets or microphones, and specific users. During the first step, you speak into the microphone to set the audio level.
Then must train the algorithms to match your speech. To complete this learning process, you read large portions of text to train the software to recognize your speech patterns. Lastly, the program searches your computer for text files or e-mail messages, which helps the software learn your writing style.
My first attempt to train Dragon NaturallySpeaking was done in a room with a high level of ambient noise. The first step of the calibration process adjusts the volume level while the second step adjusts for the noise level. I was able to get past the first but not the second step in that room. Moving to a quieter room made all the difference, and the process proceeded without incident.
Dragon NaturallySpeaking also supports input from external recording devices including PocketPCs. Training the software to recognize the audio from one of those devices is not as accurate as that from a good noise-canceling headset. To alleviate this problem, you can read a large passage of text for 15 minutes from one of eight literary works during PocketPC's setup. Be careful which passage you select because I had trouble focusing and not laughing when reading "Dogbert's Top Secret Management Handbook."
IBM's ViaVoice product uses a similar setup process. I had no problem completing the configuration steps in a quiet room. I tried both products in the noisier room noise created by a window air conditioner and occasionally by a high-speed server fan after the training session and both performed
Both products instruct you to speak in your normal tone of voice and
at the pace you would typically use. They also encourage you to pronounce your words clearly and distinctly to help the recognition process.
You need to become accustomed to watching text appear on the screen while speaking. Depending on your configuration and how fast you talk, you could speak an entire sentence before anything shows up on the screen.
To test the speech-recognition software, I used a second-grade grammar textbook and read a paragraph with a number of homonyms in it.
Both programs did a pretty good job of recognizing the difference between words such as "see" and "sea" using the context of the sentence. Dragon NaturallySpeaking couldn't seem to understand the word "homonym" while ViaVoice picked it right up. For other words, ViaVoice had problems while Dragon NaturallySpeaking got them right.
If the software misunderstands a word or phrase, you can correct the mistake so that it won't happen again. ViaVoice uses a correction pop-up menu activated by selecting the wrong word and speaking "Correct < text="">." The menu then presents a list of possible replacements. If you find the right word in the list, you say, "Pick < n="">" to select that word. You can also type in the correct word if the program can't figure it out.
You should remember to add punctuation to your speech when dictating. Both programs recognize keywords such as "period" to mean end the sentence and insert a period. Other phrases such as "new paragraph" instruct the software to end the sentence and start a new paragraph.
To get the software to recognize a keyword as text, you must speak the word as part of a sentence without pausing. There's also a spell mode that lets you spell out license plate numbers or proper names with multiple capital letters, for example.
Dragon NaturallySpeaking includes a set of tools under the Accuracy Center to add words to your vocabulary or perform additional training. You can add individual words or make the program analyze a document and let you add words to the software's vocabulary in bulk. Accuracy Center also lets you adjust your microphone settings in case you change environments or hardware.
Both programs use a toolbar that loads at the top of your screen by default. The Dragon NaturallySpeaking toolbar displays a number of color-coded menu items along with the name of the current user and the default input device. ViaVoice uses the toolbar to display what it thinks you said and to communicate error messages if it doesn't understand you.
ViaVoice includes a macro command feature to define new commands to insert special text or automate a particular function. One feature exclusive to ViaVoice is the ability to create a macro template form that you can fill out later.
ViaVoice's documentation uses the example of a form for a doctor's office that always includes patient information, symptoms and diagnoses. Both programs allow you to import and export those custom commands or macros for other users or computers.
Dragon NaturallySpeaking includes a number of features
intended for enterprise users. For example, it can store user profile information on a server for access from more than one
The professional version of Dragon NaturallySpeaking also includes software for personal digital assistants and digital dictation devices. I tried the software on a Hewlett-Packard iPaq hx2415 and found it to be more than adequate.
The product also supports multiple dictation sources for specific users. But you still must train each dictation device. Once you train the new device, you simply add it as another input device for a specific user. Dragon NaturallySpeaking automatically backs up user speech files after every fifth update, but you can change the frequency.
A Manage Users dialog box lets you choose options for setting backup and restore functions, importing/exporting custom commands, and selecting multiple dictation devices. You set those preferences at an individual computer used by multiple users or on a central file server for roaming users.
Both programs make it possible to operate a computer virtually hands-free for individuals with physical challenges. The user manuals for Dragon NaturallySpeaking and ViaVoice show how to verbally execute basic Microsoft Windows functions, such as moving the cursor on the screen and clicking the mouse. They also include basic operating instructions for the most popular productivity applications.
Options for the visually impaired are limited to reading text from within a word-processing program or the scratch pad application. Both programs provide a simple scratch pad application that allows you to dictate text, copy it and then paste it into another application.
Don't expect to see "Star Trek"-level speech recognition anytime soon. Although some users have adopted voice recognition to help with physical problems, such as carpal tunnel syndrome or other physical limitations, you won't find a headset or microphone on most desks.
Curiously, this lack of general acceptance seems to have little to do with the technology's performance, as I found in reviewing these two products. Rather, user perception and lack of motivation rank as the two biggest challenges to widespread adoption.
Many people get along fine with the way they use the computer now and don't want or need another input device that has some limitations and takes some customization.
Dragon NaturallySpeaking works well at what it does: text transcription and dictation support. Although it costs more than ViaVoice, it also offers more features and functions to justify the price difference.
Ferrill, based in Lancaster, Calif., has been writing about computers and software for more than 15 years. He can be reached at [email protected].