摘要:
Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.
摘要:
Characterizing an acoustic signal includes extracting a vector from the acoustic signal, where the vector contains information about the nuisance characteristics present in the acoustic signal, and computing a set of likelihoods of the vector for a plurality of classes that model a plurality of nuisance characteristics. Training a system to characterize an acoustic signal includes obtaining training data, the training data comprising a plurality of acoustic signals, where each of the plurality of acoustic signals is associated with one of a plurality of classes that indicates a presence of a specific type of nuisance characteristic, transforming each of the plurality of acoustic signals into a vector that summarizes information about the acoustic characteristics of the signal, to produce a plurality of vectors, and labeling each of the plurality of vectors with one of the plurality of classes.
摘要:
A method of processing a speech signal comprises converting the speech signal to digital signals, converting the digital speech signal into short-time frames, applying a Fast Fourier Transform to each of the short-time frames to obtain an original spectrum, deriving a varied spectrum based on the original spectrum, applying discrete cosine transform to compute original cepstrum coefficients for the original spectrum and varied cepstrum coefficients for the varied spectrum and generating a set of frontend feature vectors for each of the short-time frames.
摘要:
In one embodiment, one or more users may be participating in a conversation. In one example, a first user may be speaking into a speaker end device and a second user may be listening at a listener end device. The second user may be in an environment where noise may be present. Particular embodiments determine characteristics of the noise at the listener end device. Characteristics of a voice signature for a user speaking with the speaker end device are also determined. Comprehension enhancement of voice signals received from speaker end device is then performed based on characteristics of the noise at the listener end device and characteristics of the voice signature. For example, the signature of the voice signals may be altered to lessen the overlap with the noise.
摘要:
A speech processing apparatus includes a plurality of microphones which receive speech produced by a first sound source to obtain first speech signals for a plurality of channels having one-to-one correspondence with the plurality of microphones, a calculation unit configured to calculate a first characteristic amount indicative of an inter-channel correlation of the first speech signals, a storage unit configured to store in advance a second characteristic amount indicative of an inter-channel correlation of second speech signals for the plurality of channels obtained by receiving speech produced by a second sound source by the plurality of microphones, and a collation unit configured to collate the first characteristic amount with the second characteristic amount to determine whether the first sound source matches with the second sound source.
摘要:
A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background noise is used to bring the condition of the training data closer to that of the test voice sample by modifying the training data and a reduced set of the general data, before creating adapted training and general models. Match scores are generated based on the comparison between the adapted models and the test voice sample, with a final match score calculated based on the difference between the match scores. This final match score gives a measure of the degree of matching between the test voice sample and the training utterance and is based on the degree of matching between the speech characteristics from extracted feature vectors that make up the respective speech signals, and is not a direct comparison of the raw signals themselves. Thus, the method can be used to verify a speaker without necessarily requiring the speaker to provide an identical test phrase to the phrase provided in the training sample.
摘要:
Example embodiments provide a speaker authentication technology that compensates for mismatches between enrollment process conditions and test process conditions using correction parameters or correction models, which allow for correcting one of the test voice characterizing parameter set and the enrollment voice characterizing parameter set according to a mismatch between the test process conditions and the enrollment process conditions, thereby obtaining values for the test voice characterizing parameter set and the enrollment voice characterizing parameter set that are based on the same or at least similar process conditions. Alternatively, each of the enrollment and test voice characterizing parameter sets may be normalized to predetermined standard process conditions by using the correction parameters or correction models. This abstract is provided to comply with rules requiring an abstract, and it is submitted with the intention that it will not be used to interpret or limit the scope or meaning of the claims.
摘要:
A system and method for identifying an individual includes collecting biometric information for an individual attempting to gain access to a system. The biometric information for the individual is scored against pre-trained imposter models. If a score is greater than a threshold, the individual as an imposter is identified as an imposter. Other systems and methods are also disclosed.
摘要:
A method and apparatus for speaker recognition is provided that matches the noise in training data to noise in testing data using spectral addition. Under spectral addition, the mean and variance for a plurality of frequency components are adjusted in the training data and the test data so that each mean and variance is matched in a resulting matched training signal and matched test signal. The adjustments made to the training data and test data add to the mean and variance of the training data and test data instead of subtracting from the mean and variance.
摘要:
Speaker recognition (identification and/or verification) methods and systems, in which speech models for enrolled speakers consist of sets of feature vectors representing the smoothed frequency spectrum of each of a plurality of frames and a clustering algorithm is applied to the feature vectors of the frames to obtain a reduced data set representing the original speech sample, and wherein the adjacent frames are overlapped by at least 80%. Speech models of this type model the static components of the speech sample and exhibit temporal independence. An identifier strategy is employed in which modelling and classification processes are selected to give a false rejection rate substantially equal to zero. Each enrolled speaker is associated with a cohort of a predetermined number of other enrolled speakers and a test sample is always matched with either the claimed identity or one of its associated cohort. This makes the overall error rate of the system dependent only on the false acceptance rate, which is determined by the cohort size. The false error rate is further reduced by use of multiple parallel modelling and/or classification processes. Speech models are normalised prior to classification using a normalisation model derived from either the test speech sample or one of the enrolled speaker samples (most preferably from the claimed identity enrolment sample).