Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization

Invention Application

US20220122586A1 Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization 有权

Please log in to see more content

Patent Title: Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization
Application No.: US17447285

Application Date: 2021-09-09
Publication No.: US20220122586A1

Publication Date: 2022-04-21
Inventor: Jiahui Yu , Chung-cheng Chiu , Bo Li , Shuo-yiin Chang , Tara Sainath , Wei Han , Anmol Gulati , Yanzhang He , Arun Narayanan , Yonghui Wu , Ruoming Pang
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L15/06
IPC: G10L15/06 ; G10L15/22 ; G10L15/30 ; G10L15/16

Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization

Abstract:

A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

Public/Granted literature

US12094453B2 Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice Public/Granted day:2024-09-17

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）