CONTEXTUAL BIASING FOR SPEECH RECOGNITION

    公开(公告)号:US20220366897A1

    公开(公告)日:2022-11-17

    申请号:US17815049

    申请日:2022-07-26

    Applicant: Google LLC

    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

    COMPRESSED RECURRENT NEURAL NETWORK MODELS

    公开(公告)号:US20210089916A1

    公开(公告)日:2021-03-25

    申请号:US17112966

    申请日:2020-12-04

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

    CONTEXTUAL BIASING FOR SPEECH RECOGNITION

    公开(公告)号:US20240379095A1

    公开(公告)日:2024-11-14

    申请号:US18782001

    申请日:2024-07-23

    Applicant: Google LLC

    Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

    CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

    公开(公告)号:US20240304181A1

    公开(公告)日:2024-09-12

    申请号:US18598523

    申请日:2024-03-07

    Applicant: Google LLC

    CPC classification number: G10L15/063

    Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.

Patent Agency Ranking