摘要:
A method of recognizing a speaker of an utterance (602) in a speech recognition system, comprising - comparing the utterance (602) to a plurality of speaker models (604) for different speakers; - determining a likelihood score (606) for each speaker model, the likelihood score (606) indicating how well the speaker model corresponds to the utterance; and - for each speaker model (604), determining a probability (609) that the utterance (602) originates from the speaker corresponding to the speaker model (604), wherein the determination of the probability (609) for a speaker model (604) is based on the likelihood scores (606) for the speaker models and takes a prior knowledge (607) about the speaker model into account.
摘要:
The invention is directed to a method for automatic speaker recognition based on a received speech input, wherein a speaker model set comprising at least a speaker-independent speaker model is provided, comprising the steps detecting whether the received speech input matches a speaker model of the speaker model set according to a predetermined criterion; and, if no match is detected, creating a speaker model for the speaker model set based on the received speech input.
摘要:
The invention is directed to a method for determining barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold and/or based on speaker information, wherein the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined.
摘要:
This invention provides a method for determining, in a speech dialogue system issuing speech prompts, a score value as an indicator for the presence of a wanted signal component in an input signal stemming from a microphone, comprising the steps of: using a first likelihood function to determine a first likelihood value for the presence of the wanted signal component in the input signal, using a second likelihood function to determine a second likelihood value for the presence of a noise signal component in the input signal, and determining a score value based on the first and the second likelihood values, wherein the first likelihood function is based on a predetermined reference wanted signal, and the second likelihood function is based on a predetermined reference noise signal.
摘要:
The present invention relates a method for enhancing the quality of a digital speech signal containing noise, comprising identifying the speaker whose utterance corresponds to the digital speech signal, determining a signal-to-noise ratio of the digital speech signal and synthesizing at least one part of the digital speech signal for which the determined signal-to-noise ratio is below a predetermined level based on the identification of the speaker.
摘要:
The invention is directed to a method for determining barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold and/or based on speaker information, wherein the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined.
摘要:
The present invention relates to a method for enhancing the quality of a microphone signal, comprising providing at least one stochastic speaker model for a foreground speaker, providing at least one stochastic model for perturbations; and determining signal portions of the microphone signal that include speech of the foreground speaker based on the stochastic speaker model and the stochastic model for perturbations.
摘要:
The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.