摘要:
A transcription system (100) includes a computer (102), a monitor (104), and a microphone (110). Via the microphone, a user of the system provides input speech that is received and transcribed (204) by the system. The system monitors (205) the accuracy of the transcribed speech during transcription. The system also determines (210) whether the accuracy of the transcribed speech is sufficient and, if not, automatically activates (214) a speech recognition improvement tool and alerts (212) the user that the tool has been activated.
摘要:
A transcription system (100) includes a computer (102), a monitor (104), and a microphone (110). Via the microphone, a user of the system provides input speech that is received and transcribed (204) by the system. The system monitors (205) the accuracy of the transcribed speech during transcription. The system also determines (210) whether the accuracy of the transcribed speech is sufficient and, if not, automatically activates (214) a speech recognition improvement tool and alerts (212) the user that the tool has been activated. This tool could also be manually activated (206) by the user. The type of recognition problem is identified (216) by the user or automatically by the system, and the system provides (218) possible solution steps for enabling the user to adjust (219) system parameters or modify user behavior in order to alleviate the recognition problem. The system also provides the user the ability to test (222) the transcription process in order to determine whether the solution has improved the recognition accuracy.
摘要:
A method and apparatus for transcribing text from multiple speakers in a computer system having a speech recognition application. The system receives speech from one of a plurality of speakers through a single channel, assigns a speaker ID to the speaker, transcribes the speech into text, and associates the speaker ID with the speech and text. In order to detect a speaker change, the system monitors the speech input through the channel for a speaker change.
摘要:
An efficient method and system, particularly well-suited for correcting natural language understanding (NLU) commands, corrects spoken commands misinterpreted by a speech recognition system. The method involves a series of steps, including: receiving the spoken command from a user; parsing the command to identify a paraphrased command; displaying the paraphrased command; and accepting corrections of the paraphrased command from the user. The paraphrased command is segmented according to command language categories, which include a command action category, an action object category, and an action and/or object modifying category. The paraphrased command is displayed in a user interface window segmented into these command language categories. The user interface window also contains alternative commands for each segment of the paraphrased command.
摘要:
A method and system efficiently identifies voice commands for a user of a speech recognition system. The method involves a series of steps including: receiving input from a user; monitoring the computer system to log system events and ascertain a current system state; predicting a probable next event according to the current system state and logged events; and identifying acceptable voice commands to perform the next event. The system events include commands, system control activities, timed activities, and application activation. These events are statistically analyzed in light of the current system state to determine the probable next event. The voice commands for performing the probable next event are displayed to the user.
摘要:
A method and system for improving the speech command recognition accuracy of a computer speech recognition system uses event-based constraints to recognize a spoken command. The constraints are system states and events, which include system activities, active applications, prior commands and an event queue. The method and system is performed by monitoring events and states of the computer system and receiving a processed command corresponding to the spoken command. The processed command is statistically analyzed in light of the system events and states as well as according to an acoustic model. The system then identifies a recognized command corresponding to the spoken command.
摘要:
A method for enrolling a user in a speech recognition system, without requiring reading, comprises the steps of: generating an audio user interface having an audible output and an audio input; audibly playing a text phrase; audibly prompting the user to speak the played phrase; repeating the steps of audibly prompting the user not to speak, audibly playing the phrase and audibly prompting the user to speak, for a plurality of further phrases; and, processing enrollment of the user based on the audibly prompted and subsequently spoken phrases. A graphical user interface can also be generated for: displaying text corresponding to the phrases and to the audible prompts; displaying a plurality of icons for user activation; and, selectively distinguishing different ones of the icons at different times by at least one of: color; shape; and, animation.
摘要:
A novel apparatus and method for correcting speech recognized text in a predominantly speech-only environment for use with a device having only a limited or no display device available. The method is preferably implemented by a machine readable storage mechanism having stored thereon a computer program, the method comprising the following steps. First, audio speech input can be received and speech-to-text converted to speech recognized text. Second, a first speech correction command for performing a correction operation on speech recognized text stored in a text buffer can be detected in the speech recognized text. Third, if a speech correction command is not detected in the speech recognized text, the speech recognized text can be added to the text buffer. Fourth, if a speech command is detected in the speech recognized text, the detected correction speech command can be performed on speech recognized text stored in the text buffer.
摘要:
A method for guiding text-to-speech output timing with speech recognition markers can include the following steps. First, tokens can be retrieved in a TTS system. The tokens can include words, phrase markers, punctuation marks and meta-tags. Second, phrase markers can be identified among the retrieved tokens. Third, words can be identified among the retrieved tokens. Fourth, the TTS system can TTS play back the identified words. Finally, during the TTS playback of the words, the TTS system can pause in response to the identification of the phrase markers.
摘要:
A method for enrolling a user in a speech recognition system, without requiring reading, comprises the steps of: generating an audio user interface having an audible output and an audio input; audibly playing a text phrase; audibly prompting the user to speak the played phrase; repeating the steps of audibly prompting the user not to speak, audibly playing the phrase and audibly prompting the user to speak, for a plurality of further phrases; and, processing enrollment of the user based on the audibly prompted and subsequently spoken phrases. A graphical user interface can also be generated for: displaying text corresponding to the phrases and to the audible prompts; displaying a plurality of icons for user activation; and, selectively distinguishing different ones of the icons at different times by at least one of: color; shape; and, animation.