-
公开(公告)号:US20240144917A1
公开(公告)日:2024-05-02
申请号:US18494763
申请日:2023-10-25
Applicant: Google LLC
Inventor: Rami Magdi Fahmi Botros , Rohit Prakash Prabhavalkar , Johan Schalkwyk , Tara N. Sainath , Ciprian Ioan Chelba , Francoise Beaufays
IPC: G10L15/16
CPC classification number: G10L15/16
Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription. The method also includes training the exporter network based on the exporter decoder losses while parameters of the base encoder are frozen.