Transformer transducer: one model unifying streaming and non-streaming speech recognition

Invention Grant

US11741947B2 Transformer transducer: one model unifying streaming and non-streaming speech recognition 有权

Please log in to see more content

Patent Title: Transformer transducer: one model unifying streaming and non-streaming speech recognition
Application No.: US17210465

Application Date: 2021-03-23
Publication No.: US11741947B2

Publication Date: 2023-08-29
Inventor: Anshuman Tripathi , Hasim Sak , Han Lu , Qian Zhang , Jaeyoung Kim
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant J. Griffith
Main IPC: G10L15/16
IPC: G10L15/16 ; G06N3/04 ; G06N3/088 ; G10L15/06 ; G10L15/197 ; G10L15/22 ; G10L15/30

Transformer transducer: one model unifying streaming and non-streaming speech recognition

Abstract:

A transformer-transducer model for unifying streaming and non-streaming speech recognition includes an audio encoder, a label encoder, and a joint network. The audio encoder receives a sequence of acoustic frames, and generates, at each of a plurality of time steps, a higher order feature representation for a corresponding acoustic frame. The label encoder receives a sequence of non-blank symbols output by a final softmax layer, and generates, at each of the plurality of time steps, a dense representation. The joint network receives the higher order feature representation and the dense representation at each of the plurality of time steps, and generates a probability distribution over possible speech recognition hypothesis. The audio encoder of the model further includes a neural network having an initial stack of transformer layers trained with zero look ahead audio context, and a final stack of transformer layers trained with a variable look ahead audio context.

Public/Granted literature

US20220108689A1 Transformer Transducer: One Model Unifying Streaming And Non-Streaming Speech Recognition Public/Granted day:2022-04-07

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络