Graduating in 1955 from a high school dedicated to preparing boys for college, I never had an opportunity to learn to touch-type. When I began free-lance translating a few years later, it was hunt-and-peck or nothing; talk about slow! Following a brief period using Grundig dictating equipment with non-standard reel-to-reel tapes, I eventually settled on Sony dictation/transcription systems using standard audio cassettes. For years weve worked with transcribers, independent contractors who pick up and drop off work at our office while performing the actual transcription at home. This has been a deliberate choice to avoid the complexities and financial obligations of in-house staff.
Our transcribers have become very proficient over time, turning around long documents quickly and accurately. But the translation of short letters (1 to 3 pages) and rush jobs has always been a problem due to my lack of typing skills. So when voice recognition software first became available, I was very interested. It started, I believe, with a product called Power Secretary for the Macintosh, from Dragon Systems. I had a chance to try the program at computer shows, and was impressed by the simple fact that it could display on the screen the words I had just spoken.
But there were barriers to ease of use that I was not prepared to face, apart from the fact that we are a PC office with no Macs. Words had to be pronounced individually with pauses between them, and the punctuation had also to be spoken. Errors caused a window to open in which corrections could be made, but the correct words had to be spelled out letter by letter, using the military code for the alphabet (Alpha-Bravo-Charlie, etc.). Although the program could learn from its errors, the corrections had to be made as you went along and not when you were donenot a good solution for dictating translations. Back to the drawing board, I thought, and loaded another cassette in my dictating unit.
A couple of years later IBM came out with a program, called VoiceType I believe, but it was quite expensive and still had not solved the problem of accepting continuous speech. Dragon Systems also came out with Dragon Dictate for both Mac and PC platforms, but the price/performance ratio failed to tempt me.
At last, in 1996 IBM released a product for Windows 95 called SimplySpeaking and offered it at $99 including a headset microphone. At this price I was prepared to give voice recognition a try, and installed the program on my Pentium 133 with 32 MB of RAM. My experience with SimplySpeaking convinced me that voice recognition had finally matured to the point that it would be a useful tool, and I began dictating letters to train the program to become familiar with my voice and to enable it to expand its vocabulary to reflect the typical words encountered in my work.
After years of slow progress, however, voice recognition seems to have suddenly taken off. So a few months later SimplySpeaking Gold (why are all these programs called Gold?) was offered, with the ability to dictate straight into a word processing program (MS Word) instead of cut-and-paste via the Clipboard, plus some other features. I was about to upgrade my installation when a large ad appeared in the Boston Globe announcing a demonstration of NaturallySpeaking from Dragon Systemswith continuous speech recognition! This was something not to be missed, and the reduced price of $295 at the show was an added attraction.
Since then, Ive been using the program for all of my shorter letter and patent translations; as it learns more, I will include longer patents. The main reason for limiting the length of what I process this way is that correction still eats up a lot of time. On the other hand, words that are incorrectly identified and entered as other words or expressions by the program are handled consistently so that global search and replace will catch them.
1. Installation and training
The program comes on a CD-ROM and is easily installed on a Pentium 133 or higher machine running 32-bit Windows 95. You will also need a sound card; the program comes with a headset microphone and you dont need speakers (although the next upgrade will include an ability to read your dictation back to you aloud, a feature I often use on my setup with a program called Monologue). Dragon has a list of cards that work well with NaturallySpeaking; most of the more expensive Sound Blaster series are on the list. I started with 32 MB of RAM, but noticed an enormous improvement in performance with 64 MB; memory is cheap right now, so Id pick that as a minimum configuration.
Sound boards recommended by users include Turtle Beach Tropez Plus ($139 by mail) and Sound Blaster Gold 64 PNP ($180 by mail).
Training is relatively fast; to get yourself to a point where the program can begin learning, you must read one of two passages off the screen into the microphone and then allow the program to digest the results. With this stage out of the way, you can then begin dictating.
2. Your first effort
One point to remember: although you can see the program working as your words appear quickly on the screen, you are quite likely to laugh out loud at errors, especially when you are just beginning to work with the program. this creates a string of nonsense (all spelled correctly, of course; NS does not make spelling errors!) as it tries frantically to make sense of your guffaws or giggles. Of course you can highlight the mess, delete it, and start over, but its better to pronounce carefully and look somewhere else (at the source text, of course, if youre a translator).
The background environment should be consistent as well; if you train the program in winter and then try to use it in summer with the air conditioner running, you will have to retrain it. On the other hand, it either ignores telephones ringing and the occasional toot of an auto horn in the street, or displays a string of question marks to indicate its inability to identify the sound.
I have found it very helpful to think of the program as an intelligent but inexperienced human transcriber who guesses at what he or she is hearing and puts down the closest equivalent they can think of. Remember, there is no intelligence operating here as far as grammar or logic are concerned, merely recognition and comparison with stored models.
3. The learning process
As you dictate and then go back to check what the program has written, you have the choice of navigating by voice or by mouse. Of course this is extremely important for those without the use of their hands; in reading messages in CompuServe areas concerned I with voice recognition, its been a moving experience to see persons writing Im a quad... and realize that this is a quadriplegic individual communicating with the world thanks to this remarkable technology. My personal preference is to use the mouse for making minor corrections, but if the program is stubbornly refusing to identify correctly a word Ive spoken, I can say Select <word> and enter a screen where I must first spell the word I want, then train the program by speaking both the word I wanted and the word which the program kept entering instead. After a dictation session is over and I close the program down, saving the sound files changes them to reflect the changes I made that day.
4. Vocabulary building
Another feature of NS allows you to train the program by teaching it both the vocabulary you use in your work plus expressions that you use repeatedly. This is done, for example, by opening Microsoft Word for Windows in which you do most of your translation work, then opening NaturallySpeaking and listing a series of files to be analyzed. NS goes through each and compares what it finds with its own files, storing the result. You can repeat this process as often as you like, and the more often you do, the more accurately the program will identify your words.
You can add hundreds of words an hour this way, with no limit that I know of save the size of your hard disk. I have seen sets of vocabulary words offered for use in this learning process, compiled and sold by a physician; each vocabulary costs $29.95 (plus $2.50 S&H) and is described on the Web site of PCP Associates http://home1.gte.net/kaicher/medterms.htm. Doctors are among the early adopters of this technology since they routinely dictate reports that have had to be converted to text by human transcribers.
5. Saving your work
After the major editing changes have been done in NS, so that it will learn by experience, you can export the document as a text file to Word and polish it there for the final product.
6. A work in progress
After years of advancing slowly, voice recognition has taken off sharply in the last year, with several competing products from Dragon, IBM, and Kurzweil. New versions appear every few months or so, and so what I write here will shortly be overtaken by progress in the field. I recommend regular visits to the Dragon forum on CompuServe (GO Dragon) to read what is happening with the product and to keep track of upgrades (and their problems, sometimes). Dragon also has a site on the Internet (dragon.com) and that too is a regular stop for me.
7. For Mac users
Dragon Systems also makes voice recognition software for the Macintosh. There is a PlainTalk discussion mailing list that can answer any questions Mac users might have about their Mac version, PowerSecretary. I understand that the program is reasonably compatible with System 8, but requires a Power Macintosh.
Those who are skilled typists may scoff at the idea of correcting stupid errors that they would never commit, and may lack the patience to train the program properly. But if they are having problems with their wrists from overuse of the keyboard, they may find this system a welcome relief. For those who like myself have never learned to type, it is like the removal of a handicap that has chained you all your life. You can speak and see your words appear like magic on the screen, without the need to go through the steps of dictating on tape, getting it to and from the typist, and then correcting his/her work on screen. The program has already paid for itself after several months in typist charges saved.
Reviews in the computer press have been unanimous in praising NaturallySpeaking as a breakthrough in speech recognitionno more discrete speech and the need to speak with pauses between each word. True, NS must still be told where to place commas and periods as well as other punctuation, but wait a year or twothis feature may be just over the horizon. I suggest that prospective buyers wait until this fall to purchase the next release of NaturallySpeaking; theres a saying that one should never buy Version 1.0 of any program, but so far Ive been quite satisfied with the product.
Editors Note: Click here for the review of another dictation software package appearing in this issue of the Translation Journal.