Identification and utilization of misrecognitions in automatic speech recognition

    公开(公告)号:US12165628B2

    公开(公告)日:2024-12-10

    申请号:US17251284

    申请日:2020-07-08

    Applicant: Google LLC

    Abstract: Techniques are disclosed that enable determining and/or utilizing a misrecognition of a spoken utterance, where the misrecognition is generated using an automatic speech recognition (ASR) model. Various implementations include determining a recognition based on the spoken utterance and a previous utterance spoken prior to the spoken utterance. Additionally or alternatively, implementations include personalizing an ASR engine for a user based on the spoken utterance and the previous utterance spoken prior to the spoken utterance (e.g., based on audio data capturing the previous utterance and a text representation of the spoken utterance).

    Neural networks for speaker verification

    公开(公告)号:US11961525B2

    公开(公告)日:2024-04-16

    申请号:US17444384

    申请日:2021-08-03

    Applicant: Google LLC

    CPC classification number: G10L17/18 G10L17/02 G10L17/04

    Abstract: This document generally describes systems, methods, devices, and other techniques related to speaker verification, including (i) training a neural network for a speaker verification model, (ii) enrolling users at a client device, and (iii) verifying identities of users based on characteristics of the users' voices. Some implementations include a computer-implemented method. The method can include receiving, at a computing device, data that characterizes an utterance of a user of the computing device. A speaker representation can be generated, at the computing device, for the utterance using a neural network on the computing device. The neural network can be trained based on a plurality of training samples that each: (i) include data that characterizes a first utterance and data that characterizes one or more second utterances, and (ii) are labeled as a matching speakers sample or a non-matching speakers sample.

    ATTENTIVE SCORING FUNCTION FOR SPEAKER IDENTIFICATION

    公开(公告)号:US20240029742A1

    公开(公告)日:2024-01-25

    申请号:US18479615

    申请日:2023-10-02

    Applicant: Google LLC

    CPC classification number: G10L17/06 G06F16/245 G06N3/08 G10L17/04 G10L17/18

    Abstract: A speaker verification method includes receiving audio data corresponding to an utterance, processing the audio data to generate a reference attentive d-vector representing voice characteristics of the utterance, the evaluation ad-vector includes ne style classes each including a respective value vector concatenated with a corresponding routing vector. The method also includes generating using a self-attention mechanism, at least one multi-condition attention score that indicates a likelihood that the evaluation ad-vector matches a respective reference ad-vector associated with a respective user. The method also includes identifying the speaker of the utterance as the respective user associated with the respective reference ad-vector based on the multi-condition attention score.

    TEXT INDEPENDENT SPEAKER RECOGNITION

    公开(公告)号:US20230113617A1

    公开(公告)日:2023-04-13

    申请号:US18078476

    申请日:2022-12-09

    Applicant: GOOGLE LLC

    Abstract: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

    Text independent speaker recognition

    公开(公告)号:US11527235B2

    公开(公告)日:2022-12-13

    申请号:US17046994

    申请日:2019-12-02

    Applicant: Google LLC

    Abstract: Text independent speaker recognition models can be utilized by an automated assistant to verify a particular user spoke a spoken utterance and/or to identify the user who spoke a spoken utterance. Implementations can include automatically updating a speaker embedding for a particular user based on previous utterances by the particular user. Additionally or alternatively, implementations can include verifying a particular user spoke a spoken utterance using output generated by both a text independent speaker recognition model as well as a text dependent speaker recognition model. Furthermore, implementations can additionally or alternatively include prefetching content for several users associated with a spoken utterance prior to determining which user spoke the spoken utterance.

    NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING

    公开(公告)号:US20220284891A1

    公开(公告)日:2022-09-08

    申请号:US17190779

    申请日:2021-03-03

    Applicant: GOOGLE LLC

    Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

    TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

    公开(公告)号:US20220122611A1

    公开(公告)日:2022-04-21

    申请号:US17567590

    申请日:2022-01-03

    Applicant: GOOGLE LLC

    Abstract: Techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. Various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. Output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.

Patent Agency Ranking