Abstract:
A computer system comprises an input configured to receive voice input from a user, the voice input having speech intervals separated by non-speech intervals; an ASR system configured to identify individual words in the voice input during speech intervals thereof, and store the identified words in memory; a response generation module configured to generate based on the words stored in the memory an audio response for outputting to the user; and a response delivery module configured to begin outputting the audio response to the user during a non-speech interval of the voice input, wherein the outputting of the audio response is terminated before it has completed in response to a subsequent speech interval of the voice input commencing whilst the audio response is still being outputted.
Abstract:
There is provided an apparatus including a communication unit configured to transmit information permitting a second apparatus to modify stored voice recognition information based on a relationship between the first apparatus and the second apparatus.
Abstract:
Methods and systems for correcting transcribed text. One method includes receiving audio data from one or more audio data sources and transcribing the audio data based on a voice model to generate text data. The method also includes making the text data available to a plurality of users over at least one computer network and receiving corrected text data over the at least one computer network from the plurality of users. In addition, the method can include modifying the voice model based on the corrected text data.
Abstract:
The present invention relates to a device provided with a voice processing system allowing to process digital voice signals, a memory system delivering to said voice processing system the information needed to process digital voice signals, as well as a second memory system in which can be registered the digital voice signal processing result. The present invention also relates to a device wherein the voice processing system is distinctly designed so that, for an input digital voice signal, if a corresponding digital voice signal is found in the first memory system, at least a control signal for operating at least an apparatus system is produced.
Abstract:
A telephone-based interactive speech recognition system is retrained using variable weighting and incremental retraining. Variable weighting involves changing the relative influence of particular measurement data to be reflected in a statistical model. Statistical model data is determined based upon an initial set of measurement data determined from an initial set of speech utterances. When new statistical model data is to be generated to reflect new measurement data determined from new speech utterances, a weighting factor is applied to the new measurement data to generate weighted new measurement data. The new statistical model data is then determined based upon the initial set of measurement data and the weighted new measurement data. Incremental retraining involves generating new statistical model data using prior statistical model data to reduce the amount of prior measurement data that must be maintained and processed. When prior statistical model data needs to be updated to reflect characteristics and attributes of new speech utterances, statistical model data is generated for the new speech utterances. Then the prior statistical model data and the statistical model data for the new measurement data are processed to generate the new statistical model data.
Abstract:
A method for initiating an operation using voice is provided. The method includes extracting one or more voice features based on first audio data detected in a use stage; determining a similarity between the first audio data and a preset first voice model according to the one or more voice features, wherein the first voice model is associated with second audio data of a user, and the second audio data is associated with one or more preselected voice contents; and executing an operation corresponding to the first voice model based on the similarity.
Abstract:
A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.