Contextual Biasing With Text Injection
    94.
    发明公开

    公开(公告)号:US20240153498A1

    公开(公告)日:2024-05-09

    申请号:US18490861

    申请日:2023-10-20

    Applicant: Google LLC

    CPC classification number: G10L15/16 G10L15/063 G10L15/183

    Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.

    EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR

    公开(公告)号:US20240144917A1

    公开(公告)日:2024-05-02

    申请号:US18494763

    申请日:2023-10-25

    Applicant: Google LLC

    CPC classification number: G10L15/16

    Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.

    Optimizing Inference Performance for Conformer

    公开(公告)号:US20230130634A1

    公开(公告)日:2023-04-27

    申请号:US17936547

    申请日:2022-09-29

    Applicant: Google LLC

    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Patent Agency Ranking