Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks

Invention Publication

US20240153495A1 Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks 审中-公开

Please log in to see more content

Patent Title: Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks
Application No.: US18494984

Application Date: 2023-10-26
Publication No.: US20240153495A1

Publication Date: 2024-05-09
Inventor: Weiran Wang , Ding Zhao , Shaojin Ding , Hao Zhang , Shuo-yiin Chang , David Johannes Rybach , Tara N. Sainath , Yanzhang He , Ian McGraw , Shankar Kumar
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L15/06
IPC: G10L15/06 ; G06F40/284 ; G10L15/26

Multi-Output Decoders for Multi-Task Learning of ASR and Auxiliary Tasks

Abstract:

A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）