-
公开(公告)号:US20240296840A1
公开(公告)日:2024-09-05
申请号:US18592590
申请日:2024-03-01
Applicant: Google LLC
Inventor: Shaan Jagdeep Patrick Bijwadia , Shuo-yiin Chang , Tara N. Sainath , Weiran Wang , Zhong Meng
IPC: G10L15/197 , G10L15/02 , G10L15/06
CPC classification number: G10L15/197 , G10L15/02 , G10L15/063
Abstract: A joint auxiliary task and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. The model also includes a multi-output HAT decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. The model is trained by a JEIT training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.