Voice-to-text package lets users speak for themselves
From: MicroTimes  October 28, 1998 - page 195
By: Birrell Walsh

Disabled employees who have trouble typing can increase productivity
dramatically with voice-to-text word processors. Such employees can be highly
capable of performing critical operations, but impaired when it comes to
communicating on paper or on screen.  

Voice-driven word processors seem to be an obvious answer. But until
recently, few speech-to-text systems have been up to the subtle and difficult
task of rendering output smoothly, quickly, clearly, and cheaply enough to fit
into a business environment. 

Dragon Naturally Speaking seems at least partially to fulfill that promise.
It allows users to dictate word processing documents without using their
hands at all. 

Dragon Systems has made a good start with this product, but drawbacks remain.
Mainly, users Will need patience to set-up and adapt this software in order
for it to run optimally. 

Training the Dragon 

Users must train the program to recognize their voice and accent. Training
makes it possible for Dragon Naturally Speaking to understand with the way
individuals actually speak.  

For example, if English is not your first language, or you speak English as
it is spoken in another part of the world, you can teach the program to
understand your way of speaking rather than generic American speech. The
downside is that you have to train Dragon Naturally Speaking before it is
particularly useful, and that takes time and effort. 

Training begins with a wizard that checks your microphone quality. It will
find out if you have enough microphone volume and if the quality of the audio
signal is sufficiently clear. This clarity is very important; if the software
gets a garbled or unclear signal, it will use that as the basis on which it
trains and it will be worse after training than it was before training.  

This could happen, in particular, with sound cards that are not fully
compatible with the Dragon Naturally Speaking system. We did this review with
a Creative Labs Sound Blaster AWE 64 Gold, one of the best sound cards on the
market today; that issue did not arise. 

After you use the audio wizard, the training begins. It takes about a half
hour to train Dragon Naturally Speaking to recognize your voice and speaking
style. You are also encouraged to continue the training with your own
specialized vocabulary. 

The "Vocabulary Builder" will look through any document and find words that
are not in the general vocabulary that comes with the program. It will then
ask you to pronounce each word so it can recognize it in the future. If more
than one person uses the program, each one has to train it to recognize their
voice, and each one has speech files of their own. You can switch between
users, and when you do the appropriate speech files are loaded. 

To err is Draconian 

When you've been through the general training and perhaps done some specific
vocabulary building, you can move on to using the Dragon Naturally Speaking
window. This is a WordPad-like text processor in which you can dictate text,
format it, and save it as either a text file or as an .rtf (Rich Text Format)
file. 

It is in Dragon Naturally Speaking's own window that you first encounter the
program's endearing and maddening misunderstandings of what you say. When you
say "I don't think so" and it understands "by don't think so," you have two
choices. You can select the word or phrase that it has misunderstood and
simply say the word that you intended. If it understands you the second time,
it will replace the selection with what you intended, just as if you typed
over a selection in most word processors. 

A second way to correct the erroneous "by" is to say "Correct 'by'."  Then a
correction window will open and the program will present the various
things you might have said. Choice No. 1 will be what the Dragon actually
typed into the document; other choices will be other possibilities the
program considered but rejected.  

If the fourth of these rejected choice is what you did intend, you can say
"Choose 4." Then the fourth choice on the list will replace the original text
in the document. If what you intended is not on the list you can spell your
correction in ordinary English letters, and they will be typed into the
selection box. The advantage of the correction method, even though it is
slower, is that the program will learn to understand your speech from each
correction, while it will not learn from the simpler select-and-replace
method. 

Using Naturally Speaking is an exercise in diction, or cor-rect
e-nun-ci-a-tion, and we Americans tend to have a slurred and run-together
pronunciation. We forgive it in ourselves, but the Dragon notices when we
slur our words or run them together. For instance, when this paragraph was
first dictated, the program took the quickly pronounced "slur our" for
"slower." Because the program uses context to guess what words you are
speaking, you actually get better recognition if you speak a full sentence or
at least a long phrase. 

It takes a while to get much productivity out of the Dragon. You can dictate
quickly and have it transcribe what you say; but at first you lose much of
the time you have gained correcting small errors and teaching the Dragon to
understand the peculiarities of your own speech. Training makes a real
difference. When we began working with the Dragon, it was making an error in
every sentence. 

After three weeks we were able to dictate paragraphs at high speed with fewer
mistakes than we make when typing. 

It is possible to control menu commands. If you say "Click File," the file
menu will descend. If you then say "Click Save" you will save the file. We
found that when we were confronted with a dialog box and asked to name the
file that we wanted to save, we were not able to dictate the answer. The only
way we could enter the filename was to type it in the document, cut it, and
then paste it into the save file dialog box. We found that the Dragon's own
macro wizard also did not accept dictation.. 

Add ins 

Dragon Naturally Speaking includes add-ins for Corel WordPerfect 8.0 and
Microsoft Word 97. The more expensive versions actually include the Corel
WordPerfect Suite. The manual says that the add-in for Microsoft Word lets
you "say VoiceCommands, and access all the functions and menus of Dragon
Naturally Speaking."  

To some extent, the claim is true. But it is not easy to perform every
function. For instance, in Naturally Speaking's own word processor you can
say "Click File" and the file menu will descend. But in Microsoft Word,
"Click File" does not work. Instead you have to say "Press Alt Key F" and
then the file menu will descend. 

Strangely, after the menu has come down, you can say something like "Click
Print" and the program will click on the print function. Similarly, sometimes
you can say "Click Yes," and get a button pressed in the dialog box, but
sometimes not. 

But if the clicking does not work and the button you want is highlighted, you
can just say "Press Enter Key." In that case, it will be just as if you hit
the enter key on your keyboard. Work arounds like these will drive novice
business users to use strong language, but we soon found ways to do many
things without using hands at all. 

The Microsoft Word add-in connects to Dragon's proprietary software by using
Microsoft's Visual Basic for Applications. VBA slows down the response, and
seems to make it occasionally less accurate. We were not the least amused
when we spent a long time crafting eight paragraphs, and the Dragon erased
them when it misunderstood a command. When we asked it to "Undo that," which
it could have done in the Dragon's own word processor, the Visual Basic
script crashed. 

If you just must use the mouse, there are two ways to do it. You can move the
mouse a small distance by saying "Mouse up 5." The mouse will then move up a
very small distance. You can also place the mouse anywhere on the screen by
saying "MouseGrid." Dividing lines appear that cut the screen into nine
boxes. You can choose one of those boxes by saying its number. Then that
smaller section of the screen is itself divided into nine boxes, and so
forth. You can usually get the mouse anywhere on the screen within a few
moves. 

The more expensive versions of the Dragon have more extensive controls.
Dragon Systems sent me a copy of Dragon Naturally Speaking Preferred early,
and I did most of my review based on it. A few days before the deadline, the
company sent a copy of the Professional edition. This "Professional" product
lets you create macros and even has a simple script language. If you are
planning very heavy use of the Dragon technology, we recommend spending the
extra money for the macro-enabled version. 

The program itself is not particularly expensive. But if you want to make it
work, you have to have a system that is capable of supporting it. Here is
what Dragon Systems says you need: 

The minimum system requirements for Dragon Naturally Speaking are a Pentium
133 MHz processor and at least 32 Mb of RAM for Windows 95/98 or at least 48
Mb of RAM for Windows NT 4.0. Required hard disk space starts at 60 Mb and
will vary depending on which edition and what featured you are using. Users
will also need a 16 bit sound card. 

Optional components each require additional memory. An additional 16 Mb or
RAM is necessary to use the text-to-speech utility, or use NaturalWord to
dictate into Microsoft Word 97 or Corel WordPerfect 8. 

So we finally have a suitable speech-to-text system? Well, sort of. There are
a number of rough edges such as the imperfect interface with Microsoft Word,
the never-perfect transcription of speech, and the inability to use Naturally
Speaking as a navigation tool on the Web. 

But Naturally Speaking really does transcribe continuous speech and, with
enough training, most of the time it gets it right. If you would rather
dictate than type, if you are as dyslexic as this reviewer is, or if you are
disabled and just can't type, you need voice recognition. If you also have a
fair amount of patience, Naturally Speaking could be the product for you.

