Robust spoofing detection system using deep residual neural networks

    公开(公告)号:US11862177B2

    公开(公告)日:2024-01-02

    申请号:US17155851

    申请日:2021-01-22

    摘要: Embodiments described herein provide for systems and methods for implementing a neural network architecture for spoof detection in audio signals. The neural network architecture contains a layers defining embedding extractors that extract embeddings from input audio signals. Spoofprint embeddings are generated for particular system enrollees to detect attempts to spoof the enrollee's voice. Optionally, voiceprint embeddings are generated for the system enrollees to recognize the enrollee's voice. The voiceprints are extracted using features related to the enrollee's voice. The spoofprints are extracted using features related to features of how the enrollee speaks and other artifacts. The spoofprints facilitate detection of efforts to fool voice biometrics using synthesized speech (e.g., deepfakes) that spoof and emulate the enrollee's voice.

    A DEEP NEURAL NETWORK TRAINING METHOD AND APPARATUS FOR SPEAKER VERIFICATION

    公开(公告)号:US20230206926A1

    公开(公告)日:2023-06-29

    申请号:US17926605

    申请日:2020-09-21

    摘要: A feature extraction deep neural network (DNN) may be trained based on the minimization of a loss function. A similarity function may be specified to calculate a similarity score for two representations of verbal utterances. A training data set comprising pairs of representations of utterances is received, wherein each one of the pairs of representations of utterances is associated with a corresponding a ground-truth label confirming whether the pair of represented utterances come from a same speaker or not. A respective similarity score may then be calculated for each one of the pairs of representations of utterances. Parameters associated with the DNN may then be updated based on minimizing a loss function associated with an area under a section of a receiver-operating-characteristic (ROC) curve for the similarity scores, wherein the ROC curve section is delimited between a low false positive rate (FPR) value and a high FPR value.

    AUTOMATIC GENERATION AND/OR USE OF TEXT-DEPENDENT SPEAKER VERIFICATION FEATURES

    公开(公告)号:US20220215845A1

    公开(公告)日:2022-07-07

    申请号:US17700135

    申请日:2022-03-21

    申请人: GOOGLE LLC

    IPC分类号: G10L17/08 G10L17/22

    摘要: Implementations relate to automatic generation of speaker features for each of one or more particular text-dependent speaker verifications (TD-SVs) for a user. Implementations can generate speaker features for a particular TD-SV using instances of audio data that each capture a corresponding spoken utterance of the user during normal non-enrollment interactions with an automated assistant via one or more respective assistant devices. For example, a portion of an instance of audio data can be used in response to: (a) determining that recognized term(s) for the spoken utterance captured by that the portion correspond to the particular TD-SV; and (b) determining that an authentication measure, for the user and for the spoken utterance, satisfies a threshold. Implementations additionally or alternatively relate to utilization of speaker features, for each of one or more particular TD-SVs for a user, in determining whether to authenticate a spoken utterance for the user.

    VOICE AND SPEECH RECOGNITION FOR CALL CENTER FEEDBACK AND QUALITY ASSURANCE

    公开(公告)号:US20220201122A1

    公开(公告)日:2022-06-23

    申请号:US17690099

    申请日:2022-03-09

    发明人: Sylvia Hernandez

    摘要: A computer-implemented method for providing an objective evaluation to a customer service representative regarding his performance during an interaction with a customer may include receiving a digitized data stream corresponding to a spoken conversation between a customer and a representative; converting the data stream to a text stream; generating a representative transcript that includes the words from the text stream that are spoken by the representative; comparing the representative transcript with a plurality of positive words and a plurality of negative words; and generating a score that varies according to the occurrence of each word spoken by the representative that matches one of the positive words, and/or the occurrence of each word spoken by the representative that matches one of the negative words. Tone of voice, as well as response time, during the interaction may also be monitored and analyzed to adjust the score, or generate a separate score.