摘要:
Embodiments of the present invention perform speaker identification and verification by first prompting a user to speak a phrase that includes a common phrase component and a personal identifier. Then, the embodiments decompose the spoken phrase to locate the personal identifier. Finally, the embodiments identify and verify the user based on the results of the decomposing.
摘要:
A voice processing device includes: an acquirer which acquires feature quantities of vowel sections included in voice data; a classifier which classifies, among the acquired feature quantities, feature quantities corresponding to a plurality of same vowels into a plurality of clusters for respective vowels with unsupervised classification; and a determiner which determines a combination of clusters corresponding to the same speaker from clusters classified for the plurality of vowels.
摘要:
Systems and methods for bio-phonetic multi-phrase speaker identity verification are disclosed. Generally, a speaker identity verification engine generates a dynamic phrase including at least one dynamically-generated word. The speaker identity verification engine prompts a user to speak the dynamic phrase and receives a dynamic phrase utterance. The speaker identity verification engine extracts at least one voice characteristic from the dynamic phrase utterance and compares the at least one voice characteristic with a voice profile the generate a score. The speaker identity verification engine then determines whether to accept a speaker identity claim based on the score.
摘要:
A content-aware speaker recognition system includes technologies to, among other things, analyze phonetic content of a speech sample, incorporate phonetic content of the speech sample into a speaker model, and use the phonetically-aware speaker model for speaker recognition.
摘要:
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
摘要:
A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies; wherein the first two formants are within typical variability limits for one vowel phoneme type. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.
摘要:
Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
摘要:
In one embodiment, a computer system stores speech data for a plurality of speakers, where the speech data includes a plurality of feature vectors and, for each feature vector, an associated sub-phonetic class. The computer system then builds, based on the speech data, an artificial neural network (ANN) for modeling speech of a target speaker in the plurality of speakers, where the ANN is configured to discriminate between instances of sub-phonetic classes uttered by the target speaker and instances of sub-phonetic classes uttered by other speakers in the plurality of speakers.
摘要:
An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system.
摘要:
A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive an audio stream of a communication between a plurality of participants. Additionally, the programming instructions are operable to filter the audio stream of the communication into separate audio streams, one for each of the plurality of participants, wherein each of the separate audio streams contains portions of the communication attributable to a respective participant of the plurality of participants. Furthermore, the programming instructions are operable to output the separate audio streams to a storage system.