摘要:
A method and apparatus identify values for components of a vocal tract resonance vector by sequentially determining values for each component of the vocal tract resonance vector. To determine a value for a component, the other components are set to static values. A plurality of values for a function are then determined using a plurality of values for the component that is being determined while using the static values for all of the other components. One of the plurality of values for the component is then selected based on the plurality of values for the function.
摘要:
A novel beamforming post-processor technique with enhanced noise suppression capability. The present beam forming post-processor technique is a non-linear post-processing technique for sensor arrays (e.g., microphone arrays) which improves the directivity and signal separation capabilities. The technique works in so-called instantaneous direction of arrival space, estimates the probability for sound coming from a given incident angle or look-up direction and applies a time-varying, gain based, spatio-temporal filter for suppressing sounds coming from directions other than the sound source direction resulting in minimal artifacts and musical noise.
摘要:
A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.
摘要:
A method and apparatus are provided for storing parameters of a deleted interpolation language model as parameters of a backoff language model. In particular, the parameters of the deleted interpolation language model are stored in the standard ARPA format. Under one embodiment, the deleted interpolation language model parameters are formed using fractional counts.
摘要:
Audio/video programming content is made available to a receiver from a content provider, and meta data is made available to the receiver from a meta data provider. The meta data corresponds to the programming content, and identifies, for each of multiple portions of the programming content, an indicator of a likelihood that the portion is an exciting portion of the content. In one implementation, the meta data includes probabilities that segments of a baseball program are exciting, and is generated by analyzing the audio data of the baseball program for both excited speech and baseball hits. The meta data can then be used to generate a summary for the baseball program.
摘要:
A noise suppressor for altering a speech signal is trained based on a speech recognition system. An objective function can be utilized to adjust parameters of the noise suppressor. The noise suppressor can be used to alter speech signals for the speech recognition system.
摘要:
A method and computer-readable medium are provided for identifying clean signal feature vectors from noisy signal feature vectors. Aspects of the invention use mixtures of distributions of noise feature vectors and/or channel distortion feature vectors when identifying the clean signal feature vectors.
摘要:
A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
摘要:
A method and apparatus estimate additive noise in a noisy signal using incremental Bayes learning, where a time-varying noise prior distribution is assumed and hyperparameters (mean and variance) are updated recursively using an approximation for posterior computed at the preceding time step. The additive noise in time domain is represented in the log-spectrum or cepstrum domain before applying incremental Bayes learning. The results of both the mean and variance estimates for the noise for each of separate frames are used to perform speech feature enhancement in the same log-spectrum or cepstrum domain.
摘要:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.