EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR

    公开(公告)号:US20240144917A1

    公开(公告)日:2024-05-02

    申请号:US18494763

    申请日:2023-10-25

    Applicant: Google LLC

    CPC classification number: G10L15/16

    Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.

Patent Agency Ranking