摘要:
Techniques for error correction using a history list comprising at least one misrecognition and correction information associated with each of the at least one misrecognitions indicating how a user corrected the associated misrecognition. The techniques include converting data input from a user to generate a text segment, determining whether at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, if the at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, obtaining the correction information associated with the at least one misrecognition, and correcting the at least a portion of the text segment based, at least in part, on the correction information.
摘要:
Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines.
摘要:
In one aspect, a method for determining a validity of an identity asserted by a speaker using a voice print is provided. The method comprises acts of performing a first verification stage comprising comparing a first voice signal from the speaker uttering at least one first challenge utterance-with at least a portion of the voice print and performing a second verification stage if it is concluded in the first verification stage that the first voice signal was obtained from an utterance by the user. The second verification stage comprises adapting at least one parameter of the voice print based, at least in part, on the first voice signal to obtain an adapted voice print, and comparing a second voice signal from the speaker uttering at least one second challenge utterance with at least a portion of the adapted voice print.
摘要:
Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines.
摘要:
A modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors. The system includes a common library of acoustic model states for arrangement in sequences that form acoustic models. Each acoustic model is composed of a sequence of segment models and each segment model is composed of a sequence of model states. An input processor compares each vector in a sequence of input vectors to a set of model states in the common library to produce a match score for each model state in the set, reflecting the likelihood that a state is represented by a vector. The system also includes a plurality of recognition modules and associated recognition grammars. The recognition modules operate in parallel and use the match scores with the acoustic models to determine at least one recognition result in each of the recognition modules. The recognition modules includes a dictation module for producing at least one probable dictation recognition result, a select module for recognizing a portion of visually displayed text for processing with a command, and a command module for producing at least one probable command recognition result. An arbitrator uses an arbitration algorithm and a score ordered queue of recognition results, together with their associated recognition modules, to compare the recognition results of the recognition modules to select at least one system recognition result.
摘要:
In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
摘要:
In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated using one or more sets of words and/or phrases, such as pairs of words/phrases that may include words/phrases that are acoustically similar to one another and/or that, when included in a result, would change a meaning of the result in a manner that would be significant for a domain. The recognition results may be evaluated using the set(s) of words/phrases to determine, when the top result includes a word/phrase from a set of words/phrases, whether any of the alternative recognition results includes any of the other, corresponding words/phrases from the set.
摘要:
A voicemail computer system transcribes a voicemail message into text that is presented to a calling party for approval. A calling party is able to approve, disapprove or edit a voicemail message prior to delivery to one or more called parties. The voicemail computer system may analyze a voicemail message to detect errors, omissions, or potentially offensive words. The voicemail computer may analyze a voicemail message to make suggestions as to tone, content or information contained within the voicemail message. The calling party can edit the voicemail message or approve it prior to providing a notification to one or more called parties that they have received the voicemail message.
摘要:
Systems, methods, and apparatus for using different interfaces to receive from different devices representations of at least one audio signal. In some embodiments, each representation may be generated using at least one microphone of the respective device during a meeting attended by a plurality of participants. In some further embodiments, a first representation may be received from a first device via a telephone network, while a second representation may be received from a second device via a data network. In yet some further embodiments, the first and second representations may be processed to obtain a processed representation of the at least one audio signal.
摘要:
Techniques are described for performing actions for users based at least in part on spoken information, such as spoken voice-based information received from the users during telephone calls. The described techniques include categorizing spoken information obtained from a user in one or more ways, and performing actions on behalf of the user related to the categorized information. For example, in some situations, spoken information obtained from a user is analyzed to identify one or more spoken information items (e.g., words, phrases, sentences, etc.) supplied by the user, and to generate corresponding textual representations (e.g., via automated speech-to-text techniques). One or more actions may then be taken regarding the identified information items, including to categorize the items by adding textual representations of the spoken information items to one or more of multiple predefined lists or other collections of information that are specific to or otherwise available to the user.