摘要:
A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.
摘要:
In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.
摘要:
A method of producing at least one possible sequence of vocal tract resonance (VTR) for a fixed sequence of phonetic units, and producing the acoustic observation probability by integrating over such distributions is provided. The method includes identifying a sequence of target distributions for a VTR sequence corresponding to a phone sequence with a given segmentation. The sequence of target distributions is applied to a finite impulse response filter to produce distributions for possible VTR trajectories. Then these distributions are applied to a linearized nonlinear function to produce the acoustic observation probability for the given sequence of phonetic units. This acoustic observation probability is used for phonetic recognition.
摘要:
The present invention employs user modeling to model a user's behavior patterns. The user's behavior patterns are then used to influence named entity (NE) recognition.
摘要:
A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.
摘要:
A method of aiding a speech recognition program developer by grouping calls passing through an identified question-answer (QA) state or transition into clusters based on causes of problems associated with the calls is provided. The method includes determining a number of clusters into which a plurality of calls will be grouped. Then, the plurality of calls is at least partially randomly assigned to the different clusters. Model parameters are estimated using clustering information based upon the assignment of the plurality of calls to the different clusters. Individual probabilities are calculated for each of the plurality of calls using the estimated model parameters. The individual probabilities are indicative of a likelihood that the corresponding call belongs to a particular cluster. The plurality of calls is then re-assigned to the different clusters based upon the calculated probabilities. These steps are then repeated until the grouping of the plurality of calls achieves a desired stability.
摘要:
The method and apparatus utilize a filter to remove a variety of non-dictated words from data based on probability and improve the effectiveness of creating a language model.
摘要:
A method of identifying a sequence of formant trajectory values is provided in which a sequence of target values are identified for a formant as step functions. The target values and the duration for each segment target for the formant are applied to a finite impulse response filter to form a sequence of formant trajectory values. The parameters of this filter, as well as the duration of the targets for each phone, can be modified to produce many kinds of target undershooting effects in a contextually assimilated manner. The procedure for producing the formant trajectory values does not require any acoustic data from speech.
摘要:
Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.
摘要:
An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.