Optimizing Personal VAD for On-Device Speech Recognition

    公开(公告)号:US20230298591A1

    公开(公告)日:2023-09-21

    申请号:US18123060

    申请日:2023-03-17

    Applicant: Google LLC

    CPC classification number: G10L17/06 G10L17/22

    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

    Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

    公开(公告)号:US20230298609A1

    公开(公告)日:2023-09-21

    申请号:US18171368

    申请日:2023-02-19

    Applicant: Google LLC

    CPC classification number: G10L21/0208 G10L15/063 G10L2021/02082

    Abstract: A method for training a generalized automatic speech recognition model for joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving a plurality of training utterances paired with corresponding training contextual signals. The training contextual signals include a training contextual noise signal including noise prior to the corresponding training utterance, a training reference audio signal, and a training speaker vector including voice characteristics of a target speaker that spoke the corresponding training utterance. The operations also include training, using a contextual signal dropout strategy, a contextual frontend processing model on the training utterances to learn how to predict enhanced speech features. Here, the contextual signal dropout strategy uses a predetermined probability to drop out each of the training contextual signals during training of the contextual frontend processing model.

Patent Agency Ranking