摘要:
A method and apparatus map a set of vocal tract resonant frequencies, together with their corresponding bandwidths, into a simulated acoustic feature vector in the form of LPC cepstrum by calculating a separate function for each individual vocal tract resonant frequency/bandwidth and summing the result to form an element of the simulated feature vector. The simulated feature vector is applied to a model along with an input feature vector to determine a probability that the set of vocal tract resonant frequencies is present in a speech signal. Under one embodiment, the model includes a target-guided transition model that provides a probability of a vocal tract resonant frequency based on a past vocal tract resonant frequency and a target for the vocal tract resonant frequency. Under another embodiment, the phone segmentation is provided by an HMM system and is used to precisely determine which target value to use at each frame.
摘要:
A method and apparatus map a set of vocal tract resonant frequencies, together with their corresponding bandwidths, into a simulated acoustic feature vector in the form of LPC cepstrum by calculating a separate function for each individual vocal tract resonant frequency/bandwidth and summing the result to form an element of the simulated feature vector. The simulated feature vector is applied to a model along with an input feature vector to determine a probability that the set of vocal tract resonant frequencies is present in a speech signal. Under one embodiment, the model includes a target-guided transition model that provides a probability of a vocal tract resonant frequency based on a past vocal tract resonant frequency and a target for the vocal tract resonant frequency. Under another embodiment, the phone segmentation is provided by an HMM system and is used to precisely determine which target value to use at each frame.
摘要:
A method of tracking formants defines a formant search space comprising sets of formants to be searched. Formants are identified for a first frame in the speech utterance by searching the entirety of the formant search space using the codebook, and for the remaining frames by searching the same space using both the codebook and the continuity constraint across adjacent frames. Under one embodiment, the formants are identified by mapping sets of formants into feature vectors and applying the feature vectors to a model. Formants are also identified by applying dynamic programming to search for the best sequence that optimally satisfies the continuity constraint required by the model.
摘要:
A method and apparatus identify values for components of a vocal tract resonance vector by sequentially determining values for each component of the vocal tract resonance vector. To determine a value for a component, the other components are set to static values. A plurality of values for a function are then determined using a plurality of values for the component that is being determined while using the static values for all of the other components. One of the plurality of values for the component is then selected based on the plurality of values for the function.