Introduction to speech recognition

by Tripp Bradd, MD

 

Use of speech recognition with Patient Recordstm can be your answer to improving the efficiency in your office and reducing costs.  It should not be a question of IF but WHEN and the WHEN should be NOW!   Here are some helpful guidelines prior to considering speech recognition:

 

·        Commitment from the physicians to spend the time to train the software.  The “No pain, No gain” idea applies here.  Even consider a local reseller (for Dragon Systems products) to help you train (or get the training) if you don’t feel comfortable with the software.

 

·        Walk before you run!!!  Many doctors who get "bitten" with moving to electronic medical records want ALL OF THE BELLS and WHISTLES right away.  Physicians are natural technophiles (particularly ones who want to start using an electronic medical record).  However, it is good to work out all the details of your electronic medical record before moving to voice recognition.  Moving fast on so many things may be your speed, but consider how your office systems can accommodate your desires.  Your staff needs to adjust gradually to all you do, and, implementing processes in a step-wise fashion makes a lot of sense!

 

·        Adequate hardware will allow you to work with the software without any restriction.  With the efficiencies gained, it makes no sense to try to make a “silk purse out of a sow’s ear”! (please see below)

 

The professional version of Dragon NaturallySpeaking with the Medical Suite should be considered to be the appropriate version of Dragon NaturallySpeaking to use.  Other versions do not have the medical terminology.  Although you can use Vocabulary BuilderTM to create your medical terminology, this is very time-consuming.  I know, I tried this once!

 

Working with Patient Recordstm and Dragon NaturallySpeaking should not be a totally “hands free” experience (although it can be).  With the use of Patient Recordstm templates and quick text, you can have many of the progress notes done at the time of the visit.  However, when free text is needed (i.e. patient with depression or a patient with a constellation of symptoms), this is where Dragon NaturallySpeaking shines.  Although the text editor within the present version of Patient Recordstm (7.05) is not totally compatible with Dragon NaturallySpeaking, the use of " cut and paste " techniques from the Dragon NaturallySpeaking program to Patient Recordstm works very well and is efficient (although not as efficient as directly dictating into Patient Recordstm [as it was with version 7.01!]).  I am told by technical support (at PMSI) that they are presently trying to get the code within the text editor for the current version of the Patient Recordstm (ver. 7.05) to be more compatible with Dragon NaturallySpeaking.  Speech recognition should be strongly considered in your practice with the present version of Patient Recordstm.  However, the development of code to allow Dragon NaturallySpeaking to work with Patient Recordstm seamlessly (again), would make any free text entry by keyboard inefficient in comparison.

 

The new version 5.0 of Dragon NaturallySpeaking does recognize much better.  Physician Microsystems, Inc will be incorporating better functionality with version 7.5 of Patient Recordstm which is due out in early 2002.  Remember, adequate hardware is paramount in use of the voice recognition.  With CPU's now above 1 gigahertz and adequate memory (at least 256 Megabytes), speed should not be an issue.  Use of appropriate microphones and sound boards are also important.  I would go to the Dragon NaturallySpeaking website @ www.dragonsystems.com for the appropriate hardware before purchasing your system (or upgrading it).

 

Please refer to the following information excerpted from the Dragon NaturallySpeaking product reviewers guide for details on specific questions on items relating to Dragon NaturallySpeaking.

 

Dragon NaturallySpeaking: An Overview

 

Dragon NaturallySpeaking is the world's first large vocabulary, general purpose, continuous speech recognition system. Users talk naturally and their words are transcribed immediately with high accuracy on their PC screen. There are different editions of Dragon NaturallySpeaking, aimed at a variety of target audiences; these are listed on the next page.

 

Large Vocabulary

 

•      Users may write using all of the words that they normally use. They are not restricted to a limited number of common or special words. Dragon NaturallySpeaking has a total vocabulary size of more than 230,000 words and up to 62,000 active words for improved performance.

 

•      New words, specialized terms, and names may be incorporated into the vocabulary. Users can simply add them by speaking them or typing them.

 

General Purpose

 

•      Users may write about the topics of interest to them. They are not restricted to a specific topic or to filling out certain types of forms.

 

True Continuous Speech

 

•      Users speak continuously, without learning a new way to speak. The system adapts to the way that you speak and write and increases in accuracy with continued use. Dictation into most Windows Applications

 

•      Users just point their mouse and click in any application. Their text automatically appears in the text window where the cursor is. The product works well with Microsoft Word, Corel WordPerfect, Lotus Notes, WordPro, email packages, Internet chat packages and many more.

 

This document provides important guidelines and tips for evaluating Dragon NaturallySpeaking. It includes the following:

 

An overview of speech recognition, Dragon NaturallySpeaking, Dragon Systems, its target audiences, and other speech recognition technology solutions

 

•      Feature Highlights and How to Evaluate Speech Recognition Systems

 

•      A Hands on Tour of Dragon NaturallySpeaking Version 3.5, Features & Benefit Highlights

 

 

System Requirements

 

The following are system requirements for Dragon NaturallySpeaking were for version 3.5 but the requirements now are much better.  Get a 1+ gigahertz PENTIUM IV if you really want to benefit from the features of the VR software.

 

•      Minimum 133 MHz Pentium Processor, IBM compatible PC (Recommended 200 MHz), a serial port is required for use with the Dragon NaturallyMobile recorder

 

•      Windows 98, Windows 95, Windows NT 4.0

 

•      Memory requirements: Minimum 32MB (Recommended 64 MB) for Windows 95 or Windows 98, 48MB for Windows NT 4.0, additional 16MB needed when running Corel WordPerfect 8 or Microsoft Word 97 simultaneously with Dragon NaturallySpeaking. BestMatch TM technology requires a 200 MHz Pentium Processor and a minimum of 48 MB of RAM

            Hard Disk requirements: Minimum 180MB

 

•      A Dragon Systems-certified or equivalent 16-bit sound card or built-in audio system with input quality equal to or greater than the Creative Labs Sound Blaster 16

 

•      CD ROM required for installation

 

For the most up-to-date list of supported hardware, please contact Dragon Systems at www.dragonsystems.com .

 

Dragon NaturallySpeaking 3.5 Features and Benefits

 

I. Dragon NaturallySpeaking Software Features

 

•      HANDS FREE OPERATION - Users can open Windows applications from the start menu, navigate menu items in Microsoft Office 97, and switch between Windows applications.

 

•      SAPI 4.0 SUPPORT - Support for industry standard API ensures that Dragon NaturallySpeaking will seamlessly integrate with future speech-enabled Windows applications that employ this standard.

 

•       NATURAL LANGUAGE COMMANDS - Easy to use and easy to remember commands can be issued in a natural way to edit and format text within Microsoft Word 97. For example, a user can say "Sixty-seven dollars and twenty-six cents," and see $67.27 appear on the screen, the user could say "Make a five by two table," or many other variations.

 

•     WORKS IN VIRTUALLY ANY WINDOWS APPLICATION - Users just point their mouse and click in any application. Their text automatically appears in the text window where the cursor is. The product works well with Microsoft Word Corel WordPerfect, Lotus Notes, WordPro, e-mail packages, Internet chat packages and many more.

 

•     DRAGON NATURALLYMOBILETM   SOFTWARE - Users can create documents by speaking into a portable recording device, such as the Dragon NATURALLYMOBILETM  portable digital recorder, the Sony MZ-R30 palm-sized minidisc recorder, and the Norcom Model 2500 hand-held tape dictation machine. The Dragon NaturallyMobile software enhances Dragon NaturallySpeaking by making it easier to transcribe the recordings right into the user's document. (Dragon NaturallyMobile is available in the Preferred, Professional, Legal and Medical product versions).

 

•      HIGHLY ACCURATE - Industry leading BestMatchTM technology reduces errors for most users by an additional 25% over previous versions. The accuracy in previous versions of Dragon NaturallySpeaking was already reported to be from 95% to 98%.

 

•     LARGE ACTIVE VOCABULARY - The active vocabulary comes with up to 62,000 words that are ready to use. A total of 230,000 words is on disk and can be automatically retrieved by the system. Each word contains spelling, pronunciation and language usage information for high accuracy. Users can customize the vocabulary with up to 54,000 new words, proper names, and specialized terms.

 

•     SPEECH PLAYBACK - Recorded speech playback allows users to listen to what they said for easier proofreading and editing. The text-to-speech feature allows users to have words that appear on the screen read aloud to them using high quality text-to-speech from ELAN Informatique.

 

•      TRUE CONTINUOUS SPEECH - Users speak naturally and at a normal pace without the need to pause between words. Dragon NaturallySpeaking was the first product on the market that recognized large vocabulary general purpose continuous speech.

 

 

II. A Dragon NaturallySpeaking Software Benefits

 

•      Highly Accurate-Independent tests in previous versions achieved accuracy between 95% and 98%

 

•      Industry leading BestMatchTM technology reduces errors for most users by an additional 25% over previous versions.

 

•      Use Natural Language Commands in Microsoft Word 97 to edit, format, and navigate without memorizing a set of fixed commands. Just state the commands in the manner that is natural to you.

 

•      Navigate the desktop by voice to launch or switch applications, save or print documents, and more.

 

•      Add new words, specialized terms, and names in a few simple steps with the Vocabulary BuilderTM, which enables users to customize the system for themselves and improves accuracy. (for a detailed explanation of how the Vocabulary Builder works, please see the Appendix)

 

•      Speak naturally and at your normal pace-up to 160 words per minute and more without pausing between words.

 

•      Spelling, pronunciation, context recognition, and word usage information is included for the total vocabulary of more than 230,000 words.

 

•      Dictate, at your desk, directly into your favorite Windows applications including America Online, Corel, WordPerfect, Lotus Notes, Microsoft Exchange and Word 97, Qualcomm, Eudora, and many others.

 

•      Learns most dialects, accents, and individual pronunciations quickly and automatically.

 

•      Dictate, edit, and format by voice directly in Microsoft Word 97 and Corel WordPerfect 8 and other popular Windows applications using the revolutionary Select-and-Say editing technology. Even edit by voice documents that were previously typed. Use natural spelling to spell words using standard letter names.

 

•      Listen to a recording of what you said before you edit, in order to keep your original intent, or have documents you did not record, such as e-mail, read aloud to you.

 

•      Supports multiple users who can each personalize their vocabulary.

 

•      Includes a high-quality headset microphone for use with your recorder and at your desktop.

 

Getting Started

 

This section includes:

 

•      Tips for a successful experience with Dragon NaturallySpeaking.

 

•      Instructions for training Dragon NaturallySpeaking to understand your recorded speech.

 

Tips for a Successful Experience with Dragon NaturallySpeaking

 

It is extremely important to follow these key steps before you begin the evaluation process. Since there are several editions of Dragon NaturallySpeaking aimed at different markets, please make sure to refer to the box to determine which edition you are

evaluating. The features and benefits of each are listed in the comparison chart later in this guide. You should also refer to Dragon's web site www.dragonsys.com for the most updated information, compatibility list, announcements, and tips for a successful evaluation experience.

 

•      Make sure your system and soundcard are compatible with Dragon NaturallySpeaking. Full certification currently includes 21 microphones, 24 notebooks, and 21 soundcards. Please refer to our web site for the latest approved list before you begin. There is a compatibility list included in the appendix. Make sure your microphone is turned on, plugged in to the right jack, and positioned correctly, with the microphone facing the lower corner of your mouth. Make sure the microphone is not too far away from your mouth or too far up on your face and that the "correct" side of the microphone is facing your mouth. It is also important that the microphone is not placed too close to your mouth, as this will create additional "breath" sounds, which will affect the quality of your sound signal.

 

•      Use the microphone that Dragon provides in the box. Dragon tests and approves high-quality, noise-canceling microphones that are suitable for large vocabulary speech recognition dictation.

 

•      Use the New User Wizard to guide you through the process of setting up your system and files. If your system has more RAM, the User Wizard will recommend the appropriate recognition technology to take advantage of the extra memory for improved accuracy and performance.

 

•      During Training, speak naturally and clearly. There is no need to use punctuation during the training session. You should typically train in the same environment as the environment where you plan to use the product. If you plan to use the product in a fairly noisy environment, you should train in a noisy environment.

 

•      After you finish training, run the Vocabulary Builder. Specific instructions on how to do this are included later in this guide. The Vocabulary Builder customizes the system to the way you work, which will help to boost your accuracy right out-of-the box. This is useful for text that you might not normally dictate, i.e., poetry, literature, with infrequently used words or phrases, proper nouns, or industry specific text.

 

•      When you begin dictation, avoid constantly looking at the screen. It is a natural tendency for first time speech recognition users to look at the screen when they begin dictating. This disrupts your natural flow of dictation, creating inconsistencies in your voice pattern.

 

•      Dictation - Once you begin dictation, remember to speak naturally and clearly. Pronounce each word clearly, but try not to "over pronounce." To get the best performance from the system, you should speak in the same manner you did during training.

 

•      To Format Text, simply say what you want to do. Natural Language Commands allow you to say what you want to do naturally, i.e., bold "word," underline "word," select paragraph, make that red, center that, make that a title, and so on.

 

 

How to Evaluate Speech Recognition Solutions

 

You should consider the following before and during your evaluation of speech recognition systems: Recognition accuracy-For continuous speech systems, such as Dragon NaturallySpeaking, you can calculate recognition accuracy as the number of corrections needed divided by the number of words spoken. This is useful method because a single recognition error can impact words that are adjacent to each other. For example, the ending of one word could be recognized incorrectly as the beginning of the next word. With Dragon NaturallySpeaking, however, such errors can be corrected by a single command.

 

•            Throughput rate vs. input rate- The throughput rate is the amount of time it takes to do productive work. In dictation, the throughput rate would include the time it takes to enter text plus the time required to correct any errors and to print the final document. Input rates refer only to the number of words spoken to the system, whether or not they are correct.

 

•           Ease of correcting or editing text- Dragon NaturallySpeaking uses the same simple method for updating text whether you change your mind, wish to add formatting, or want to correct a recognition error. Users simply change the words on the screen by saying "select", followed by the word or phrase to be changed. This style was inspired by the way you might tell another person to make a correction.

 

•           Ease of adding words- Consider ease of use when adding words to the system. It is important that a user's new words become integrated into the active vocabulary and are not tacked on at the end of a list.

 

•            General purpose vs. specialized purpose- General purpose systems allow users to write about the topics that are of interest and importance to them. They also allow people to change and add topics as needed. Specialized-purpose products, which are easier to create, are often restricted to one topic area. General purpose systems are significantly more difficult to build than a specialized system.

 

Frequently Asked Questions (FAQs)

 

The following are frequently asked questions about Dragon NaturallySpeaking.

 

       What is the difference between "Natural Speech" and "Natural Language"?

 

 While Dragon NaturallySpeaking can transcribe your words, it does not actually understand what you are saying. Speech isn't necessarily language, nor does natural speech imply that there is natural understanding. "Natural Speech" encompasses an umbrella of natural speech solutions that enable the user to talk in a natural manner they might talk to a transcriptionist or over the phone. Dragon is working towards supplying natural speech solutions to the market. "Natural Language", on the other hand, implies that the computer understands the meaning of what is being said. "Natural Language Understanding" is also a term used for the research goal of getting computers to deduce the meaning of natural language.

 

       What is the difference between continuous speech, natural speech, and discrete speech?

 

•             Continuous Speech  is when words are spoken rapidly without pauses between individual words, as in conversational speech.

 

•            Natural Speech includes continuous speech, but also means that users can speak normally.

 

•             Discrete Speech requires users to speak distinctly and pause slightly between words. Continuous speech is much more computationally demanding.

 

             What is the difference between speaker-independent vs. speaker-adaptive vs. speaker-dependent systems?

 

•            Speaker-Independent systems respond to any voice without learning about the user. Such systems are necessary for kiosks and other implementations where the user is not known. To achieve reasonable accuracy, speaker independent systems often have very small vocabularies.

 

•            Speaker-Adaptive systems are systems that provide speech-recogition with close to speaker-dependent performance. While a brief training period will improve initial recognition accuracy, initial training isn't necessary. The more you use the system, the better the system performs. Speaker-Dependent are systems that require the user to train the system to recognize their voice by pronouncing sample words.

 

 

       Is Dragon NaturallySpeaking speaker-independent?

 

       Dragon NaturallySpeaking is a personal system designed to run on a PC that is usually operated by the same person. By concentrating on such users with a speaker-dependent system, Dragon NaturallySpeaking can achieve higher performance. Dragon NaturallySpeaking training is designed to be short and entertaining. The system requires only 18 minutes of recorded speech.

 

       When did Dragon Systems start working on continuous speech and Dragon NaturallySpeaking?

 

       Drs. Janet and Jim Baker started working on continuous speech 25 years ago as graduate students at Carnegie Mellon University. Dragon Systems has been working on continuous speech for the past 10-14 years and on Dragon NaturallySpeaking for approximately three years.

 

       What kind of technology is used in this product?

 

       Speech signals are highly variable and influenced by many factors. Dragon Systems combines information from a number of different sources of knowledge (acoustic, semantic, syntactic, speaker, environmental, etc.). To achieve the highest performance recognition possible, Dragon uses statistical optimization techniques. Dragon's expertise in very computationally efficient methodologies means that this technology can now be engineered for interactive users on  Pentium PCs (rather than on mainframes and workstations running real time).

 

       Will it run faster/better for MMX?

 

       Dragon NaturallySpeaking does ran faster on MMX, however, it is not optimized for MMX. There are plans to do so in future versions. Dragon Systems was involved with the development of the MMX technology and specified requirements that would be beneficial for speech technology.

 

       What is the accuracy and speed?

 

       Independent reviewers and many of our test users dictate at rates of up to 160 words per minute with accuracy of 95-98% for their normal work.

 

       Does it support multiple users?

 

       Yes, All editions of Dragon NaturallySpeaking support multiple users, including Dragon NaturallySpeaking Legal Suite and Medical Suites. Dragon Point &Speak; however, does not support multiple users.

 

        How easy is it to import my text into another application?

 

Users can import text into another applications by doing any of the following:

 

•     Copy the text through the Windows clipboard with the phrase "COPY ALL TO CLIPBOARD" or they can use one of the standard Windows methods to copy text. This text can be pasted into any application that supports the function.

 

•     Files can be saved as a "RICH TEXT FORMAT" document, which is easily read by major word processors such as Microsoft Word, Corel WordPerfect and Lotus WordPro.

 

•     Users can dictate directly into virtually any Windows application.

 

       Will Dragon NaturallySpeaking support the MAC OS?

 

There are no current plans to support the MAC OS with Dragon NaturallySpeaking.

 

        Which sound cards are compatible?

 

            High quality 16-bit sound cards and built-in audio systems will work well with Dragon NaturallySpeaking. A list of tested systems that Dragon Systems can recommend include the following: (Note: this list is continuously being updated - please call 1.800.4.DRAGON for the latest list or refer to Dragon's web site: www.dragonsys.com).

 

How the Vocabulary Builder Works

 

The Vocabulary Builder is designed to improve recognition performance by changing the Language Model. Language Model is a term which applies to the statistics of how words follow other words. In Dragon NaturallySpeaking, the decision about what you actually said is made based on both the acoustics, which are trained using the General Training program and the Language Model.

 

Dragon NaturallySpeaking comes with a built-in language model which reflects a variety of different topics that you could dictate in general English transcription. The Vocabulary Builder allows you to customize the language model to a tighter range of potential discussion topics. When you improve the language model with the Vocabulary Builder, you improve recognition accuracy because Dragon NaturallySpeaking then has a better idea of what you are possibly talking about when it tries to interpret your voice.

 

To use the Vocabulary Builder, first you identify a number of documents on your disk would reflect the way you will be writing. For example, if you are lawyer who writes contracts, you should find a number of contracts that you have all written as examples for the Vocabulary Builder. If you are journalist, you probably want to find a number of articles which you have written in the past to use in the Vocabulary Builder.

 

After you identify the documents you want to use, you should start the Vocabulary Builder from the tools menu of Dragon NaturallySpeaking. Then list all of the documents which you would like the Vocabulary Builder to consider when building its language model. As a general guide, you should build language models with at least 100,000 bytes of text, more if possible. This assumes of course, that you are dealing with text documents, and not Microsoft Word documents. If you're dealing with Microsoft Word documents, then the size of the documents will be larger. 100,000 bytes of text corresponds to approximately 17,000 words.

 

After you feed it documents, the Vocabulary Builder parses those documents to extract the word and punctuation information from them. Then the Vocabulary Builder computes some simple statistics about word usage within the documents that you fed it. The next interactive step is when the Vocabulary Builder presents to you a list of new words that it has found. This is one of the most confusing features of the Vocabulary Builder. The Vocabulary Builder will compile a complete list of all of the words which it found in the documents which it parsed. Most of the words which the Vocabulary Builder finds will either already be in your active vocabulary, or be in your total dictionary (the 230,000+ words on disk). However, Vocabulary Builder will often find words which are neither in the active vocabulary nor in the backup dictionary, and these words will be presented to you in a list.

 

The list which the Vocabulary Builder presents to you is sometimes very confusing. It includes a number of different types of words which you are required to sort through. It's a good idea to do a spell check in the documents you wish to scan in. For example, if there is a spelling mistake in your document, then the Vocabulary Builder will find the word which is misspelled like a spell checker would, and present it to you as a possible new word. In addition, if you used an acronym or abbreviation which the Vocabulary Builder did not find in your dictionary, that will be presented to you as well.

 

Most people find that the list of new words from the Vocabulary Builder actually looks like a list of capitalized words from your document. What is happening is that Vocabulary Builder is looking for candidate new words.

 

If it finds a capitalized word at the beginning of the sentence then it will assume that the word was capitalized because it was at the beginning of sentence and not include it in the new word list. However, if the Vocabulary Builder finds a capitalized word in the middle of your sentence, then it presents that capitalized word to you for consideration based on the premise that the capitalization information in that word may be interesting. For example, if you dictate legal documents then the word "plaintiff' is often capitalized. The Vocabulary Builder will detect that you are often capitalizing the word "plaintiff' in your document and present it to you in the word list as a potential new word. Then, if you decided to add the word "plaintiff' with a capital P to your vocabulary, the Vocabulary Builder will build statistics of when it should use "plaintiff " with a capital P and when it should use "plaintiff " with a lowercase P.

 

Consider for example, the following text.

 

The Plaintiff was caught reading the article "How to Spot Capitalized Words" in PC Magazine. He was struck by text which WAS IN ALL UPPERCASE, and which was spelled incorrrrectly. The Vocabulary Builder should suggest the new words "Plantiff" 'and "Magazine", both of which are interesting. However, the Vocabulary Builder will also find "How", "Spot", "Capitalized" and "Words" and list them as new words. (The words in all uppercase will be ignored.) In this case, you should only add "Plantiff"' and "Magazine" to your vocabulary and ignore the other new words.

 

The list of new words from the Vocabulary Builder is intentionally sorted in frequency order. This means that the new word which is most common is at the beginning of a list and the word which was found the least in your document is at the end of the list. This allows you to correctly consider only the words at the beginning of a list as potential ones to add your vocabulary, ignoring words which are very rare and, therefore, listed later.

 

Avoid the temptation to select every word in the list. You can always go back and add new words later. Every word that you select will then be added to Train Words dialog and Vocabulary Builder will prompt you to speak each individual word once. This step is optional but recommended. Dragon NaturallySpeaking will guess at a pronunciation for every new word that it sees. However, in some cases the pronunciation which Dragon NaturallySpeaking guesses is not appropriate or accurate. By speaking the word once to the Train Words dialog, you can help Dragon NaturallySpeaking select a better pronunciation and therefore improved the chances of recognizing that new word properly.

 

Once you have completed training the new words, Dragon NaturallySpeaking will go ahead and build the language model based on the documents it scanned and the new words you selected.

 

The Vocabulary Builder actually does two things at this point. First, it builds a set of all of the words which Dragon NaturallySpeaking found in your documents whether they were in the active vocabulary, the total dictionary, or in the list of new words which you presented. Let's say for example that it found 2,000 words which were already active, 5,000 words which were in the backup dictionary, and you added 10 new words. That is a total of 70 10 words. The Vocabulary Builder will then make sure at all 7010 words are made active.

 

Once the Vocabulary Builder has made sure that all of words in your document are active, the Vocabulary Builder then builds a statistical language model from those words. The statistical language model includes information about how the words were used in your writing.

 

The purpose of the statistical language model is to predict the words which you will be saying based on the other words which have previously dictated before and after. For this reason, running the Vocabulary Builder improves your accuracy more than just simply adding in all of the words to your active vocabulary which the Vocabulary Builder would add.

 

If you ran the Vocabulary Builder a second time, then the entire procedure is repeated from scratch. Having run the Vocabulary Builder before, any new words which are found in your document will also be compared against the words you previously added. Therefore, running the Vocabulary Builder twice on the same set of documents should produce a list of new words which is smaller by the words which you previously added to your vocabulary.

 

For this reason, you cannot incrementally improve the language model by running the Vocabulary Builder on one or two additional documents. If you have more text which you want Dragon NaturallySpeaking to consider in your language model, you will have to rerun the Vocabulary Builder with all the previous text you used before plus the new text that you want to consider, in one session.

 

To allow people to have multiple topics based on running the Vocabulary Builder on different sets of documents, the Professional Edition includes the feature of supporting multiple topics. Each topic in the Professional Edition is a separate vocabulary with a separate set of 30,000 active words and each topic has its own statistical language model information produced by running Vocabulary Builder on the topic.

 

 

How Does Dragon NaturallySpeaking Learn?

 

People who used DragonDictate for Windows (Dragon's earlier discrete recognition product) will remember that DragonDictate for Windows constantly adapted its acoustic models to their voice after everything that they spoke. This made DragonDictate for Windows continually improve in its recognition accuracy, at the expense of requiring the user to correct every mistake as soon as they occurred. In Dragon NaturallySpeaking, the product no longer continuously adapts with everything the user says. This implies that the user will be correcting every mistake. In the Dragon NaturallySpeaking, users are not require to correct every mistake and the system will not get any worse in its recognition performance if the user decides not to correct any mistakes.

 

However, Dragon NaturallySpeaking will adjust its acoustic models of the users voice (improving recognition accuracy) under the following circumstances:

 

•     Dragon NaturallySpeaking adapts whenever the user runs the General Training program. General Training is specifically designed to teach Dragon NaturallySpeaking how the user speaks. At the end of every General Training session, Dragon NaturallySpeaking will adapt and modify the acoustic models which represent how the user speaks based on what is said during General Training. This means that recognition accuracy can be improved every time by running General Training again and again. Even after the initial 18 minutes of training, users can go back and run General Training and read any amount of text, and Dragon NaturallySpeaking will learn. For example, if training is done in a quiet office environment but then the user moves to a noisy environment, they can then go back and train for a few minutes and it will adapt to the acoustics of the new loud, noisy environment.

 

•     Dragon NaturallySpeaking will also learn about how you speak every time you use the Train Words Dialog. This is the dialog which prompts you to read one word or phrase at a time. When the user speaks a word into the Train Words Dialog, Dragon NaturallySpeaking will adapt its acoustic models for that word (or phrase) but also for other words in the vocabulary which share similar sounds. Using the Train Words Dialog is the best way to get Dragon NaturallySpeaking to start recognizing a word which it seems to be having problems with. In the Train Words Dialog, it is possible to train word so its recognizes when the user says something other than what Dragon NaturallySpeaking would normally expect.

 

•           The dictionary which is supplied with Dragon NaturallySpeaking is very comprehensive in its coverage of various pronunciations. However, lets say that there is a word that a user pronounces completely differently from how Dragon NaturallySpeaking expects. A classic example is the word "IT". Dragon NaturallySpeaking expects the user to pronounce this "it" which rhymes with "bit". But most people actually say "eye-tee". If the user trains the word "IT" in the Train Words Dialog, and pronounce it "eye-tee" then Dragon NaturallySpeaking would adjust the pronunciation of this word so that it will recognize it in future.

 

•           Dragon NaturallySpeaking will adapt whenever the user uses the Correction Dialog. The Correction Dialog is displayed when "Correct That" or "Correct <text>" is said, or the Correction Dialog hot key is used. If Dragon NaturallySpeaking recognizes something other than what was said, and the Correction Dialog is used to teach Dragon NaturallySpeaking what was actually said, then not only will Dragon NaturallySpeaking correct the text in the document, but it will use that information to learn about how the user speaks so that it will be less likely to make similar mistakes in future.

 

•           Dragon NaturallySpeaking will adapt the acoustic models for the user's voice based on the actual thing that is said (which the user told it in the Correction Dialog). This is why this important to only use the Correction Dialog to correct a misrecognition and not when the user changes their mind.

 

In summary, Dragon NaturallySpeaking will adapt its model of the user's voice whenever General Training, the Correction Dialog, or the Train Words Dialog are used. The more Dragon NaturallySpeaking learns about the user's voice, the better the recognition performance will be. However, users are not required to correct errors when they dictate in order for Dragon NaturallySpeaking to learn more about the user's voice.

 

Copyright 1999 -  Dragon Systems, Inc. Newton, Ma. (617) 965-5200

(Portions excerpted from Dragon NaturallySpeaking Product Reviewer's Guide)

(7/99 - updated 8/31/01)