Abstract:
In accordance with a present invention speech recognition is disclosed (10). It uses a microphone to receive audible sounds input by a user into a first computing device (28) having a program with a database (16) comprising (i) digital responses of known audible sounds and associated alphanumeric representations of the known audible sounds and for the first time (ii) digital representations of known audible sounds corresponding to mispronunciations resulting from known class of mispronounced words and phrases. The method is performed by receiving the audible sounds in the form of the electrical output of the microphone (28). A particular audible sound to be recognized is converted into a digital representation of the audible sound (30). The digital representation of the particular audible sound is then compared to the digital representations of the known audible sounds to determine which of those known audible sounds is most likely to be the particular audible sounds in the database (30).
Abstract:
A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. The inventive method comprises examining a text to be spoken to an audience for a specific communications purpose, followed by marking-up the text according to a phonetic markup systems such as the Lessac System pronunciation rules notations. A set of rules to control a speech to text generator based on speech principles, such as Lessac principles. Such rules are of the tide normally implemented on prior art text-to-speech engines, and control the operation of the software and the characteristics of the speech generated by a computer using the software. A computer is used to speak the marked-up text expressively. The step of using a computer to speak the marked-up text expressively is repeated using alternative pronunciations of the selected style of expression where each of the tonal, structural, and consonant energies, have a different balance in the speech, are also spoken to a trained speech practitioners that listened to the spoken speech generated by the computer. The spoken speech generated by the computer is then evaluated for consistency with style criteria and/or expressiveness. And audience is then assembled and the spoken speech generated by the computer is played back to the audience. Audience comprehension of spoken speech generated by the computer is evaluated and correlated to a particular implemented rule or rules, and those rules which resulted relatively high audience comprehension are selected.
Abstract:
In accordance with a present invention speech recognition (10) and training (110), methods and systems are disclosed. A microphone receives audible sounds input (28) from a user into a first computing device having a program with a database (16). The database consists of digital representations of known audible sounds and associated alphanumeric representations of the known audible sounds and mispronunciations. The program compares the digital representation to the digital representations of known audible sounds in a database (30) to determine the likely desired output. If an error in recognition (32) occurs, then the user can indicate the proper alphanumeric representation of the particular audible sound (34). This allows the system to determine whether the error is a result of a known type or instance of mispronunciation (36). In response to a determination of the error's nature, the system presents an interactive training program from the computer to the user to enable the user to correct such mispronunciation (45). The present invention has the advantage of improving voice recognition and speech patterns of the user by focusing in on the user in error correction. Thus improving oral communication skills of the user.
Abstract:
A dictation command voice multitasking interface is illustrated by the GUI computer training template to be displayed and implemented by the creation of a question and multiple answer database where, for example, a first box (10) labeled "print question" receives text for question A. "RecQ" box (11) is selected by a mouse in which case the trainer records the voice equivalent of the question. The system is thereby made responsive to recognized spoken words, such as for the alternative questions illustrated by box (10) and box (18) and corresponding stored voice equivalents illustrated by box (11) and (19). A voice equivalent of the printed answer in box (12) is stored in as "RecA" in box (13). Corresponding descriptive text is stored in Box (14). The process is interactive in storing the voice equivalents as shown by decision box (34) which queries the trainer for more questions to be stored in the database wherein the computer interrupt handler (25) waits for further input from voice (24). A practical application of the system would enable a doctor, with hands and eyes occupied in performing a clinical procedure, to input voiced queries to the computer in order to create a report during the clinical procedure.
Abstract:
A speech reference enrollment method involves the following steps: (a) requesting a user speak a vocabulary word; (b) detecting a first utterance (354); (c) requesting the user speak the vocabulary word; (d) detecting a second utterance (358); (e) determining a first similarity between the first utterance and the second utterance (362); (f) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word; (g) detecting a third utterance (366); (h) determining a second similarity between the first utterance and the third utterance (370); and (i) when the second similarity is greater than or equal to the predetermined similarity, creating a reference (364).
Abstract:
A request to execute an interaction site associated with a custom grammars file is received from a user device and by a communications system. An interaction flow document to execute the interaction site is accessed by the communications system. The custom grammars file is accessed by the communications system, the custom grammars file being configured to enable the communications system to identify executable commands corresponding to utterances spoken by users of user devices. An utterance spoken by a user of the user device is received from the user device and by the communications system. The utterance is stored by the communications system. The custom grammars file is updated by a grammar generation system to include a representation of the stored utterance for processing utterances in subsequent communications with users.
Abstract:
The method is performed at an electronic device with one or more processors and memory storing one or more programs for execution by the one or more processors. A first speech input including at least one word is received. A first phonetic representation of the at least one word is determined, the first phonetic representation comprising a first set of phonemes selected from a speech recognition phonetic alphabet. The first set of phonemes is mapped to a second set of phonemes to generate a second phonetic representation, where the second set of phonemes is selected from a speech synthesis phonetic alphabet. The second phonetic representation is stored in association with a text string corresponding to the at least one word.
Abstract:
A system and method includes a language processing module converting an electrical signal corresponding to an audible signal into a textual signal. The system further includes a command generation module converting the textual signal into a user receiving device control signal. A controller controls a function of a user receiving device in response to the user receiving device control signal.
Abstract:
An approach to improving the performance of a wordspotting system includes providing an interface for interactive improvement of a phonetic representation of a query based on an operator identifying true detections and false alarms in a data set.