摘要:
A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.
摘要:
A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.
摘要:
A method and apparatus are provided for reducing noise in a training signal and/or test signal. The noise reduction technique uses a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors from these channel signals, a collection of noise correction and scaling vectors is determined. When a feature vector of a noisy pattern signal is later received, it is multiplied by the best scaling vector for that feature vector and the best correction vector is added to the product to produce a noise reduced feature vector. Under one embodiment, the best scaling and correction vectors are identified by choosing an optimal mixture component for the noisy feature vector. The optimal mixture component being selected based on a distribution of noisy channel feature vectors associated with each mixture component.
摘要:
A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.
摘要:
A method and apparatus are provided for reducing noise in a training signal and/or test signal. The noise reduction technique uses a stereo signal formed of two channel signals, each channel containing the same pattern signal. One of the channel signals is “clean” and the other includes additive noise. Using feature vectors from these channel signals, a collection of noise correction and scaling vectors is determined. When a feature vector of a noisy pattern signal is later received, it is multiplied by the best scaling vector for that feature vector and the best correction vector is added to the product to produce a noise reduced feature vector. Under one embodiment, the best scaling and correction vectors are identified by choosing an optimal mixture component for the noisy feature vector. The optimal mixture component being selected based on a distribution of noisy channel feature vectors associated with each mixture component.
摘要:
A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or “on-the-fly” based on the currently displayed text (e.g. the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.
摘要:
Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.
摘要:
Architecture that provides the integration of automatic speech recognition (ASR) and machine translation (MT) components of a full speech translation system. The architecture is an integrative and discriminative approach that employs an end-to-end objective function (the conditional probability of the translated sentence (target) given the source language's acoustic signal, as well as the associated BLEU score in the translation, as a goal in the integrated system. This goal defines the theoretically correct variables to determine the speech translation system output using a Bayesian decision rule. These theoretically correct variables are modified in practical use due to known imperfections of the various models used in building the full speech translation system. The disclosed approach also employs automatic training of these variables using minimum classification error (MCE) criterion. The measurable BLEU scores are used to facilitate the implementation of the MCE training procedure in a step that defines the class-specific discriminant function.
摘要:
A method of tracking formants defines a formant search space comprising sets of formants to be searched. Formants are identified for a first frame in the speech utterance by searching the entirety of the formant search space using the codebook, and for the remaining frames by searching the same space using both the codebook and the continuity constraint across adjacent frames. Under one embodiment, the formants are identified by mapping sets of formants into feature vectors and applying the feature vectors to a model. Formants are also identified by applying dynamic programming to search for the best sequence that optimally satisfies the continuity constraint required by the model.
摘要:
A system and method are provided that accurately estimate noise and that reduce noise in pattern recognition signals. The method and system define a mapping random variable as a function of at least a clean signal random variable and a noise random variable. A model parameter that describes at least one aspect of a distribution of values for the mapping random variable is then determined. Based on the model parameter, an estimate for the clean signal random variable is determined. Under many aspects of the present invention, the mapping random variable is a signal-to-noise ratio variable and the method and system estimate a value for the signal-to-noise ratio variable from the model parameter.