Mixture model attention for flexible streaming and non-streaming automatic speech recognition

Invention Grant

US12014729B2 Mixture model attention for flexible streaming and non-streaming automatic speech recognition 有权

Please log in to see more content

Patent Title: Mixture model attention for flexible streaming and non-streaming automatic speech recognition
Application No.: US17644344

Application Date: 2021-12-15
Publication No.: US12014729B2

Publication Date: 2024-06-18
Inventor: Kartik Audhkhasi , Bhuvana Ramabhadran , Tongzhou Chen , Pedro J. Moreno Mengibar
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant Griffith
Main IPC: G10L15/16
IPC: G10L15/16 ; G06F1/03 ; G06N3/04 ; G06N3/0455 ; G10L19/16

Mixture model attention for flexible streaming and non-streaming automatic speech recognition

Abstract:

A method for an automated speech recognition (ASR) model for unifying streaming and non-streaming speech recognition including receiving a sequence of acoustic frames. The method includes generating, using an audio encoder of an automatic speech recognition (ASR) model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method further includes generating, using a joint encoder of the ASR model, a probability distribution over possible speech recognition hypothesis at the corresponding time step based on the higher order feature representation generated by the audio encoder at the corresponding time step. The audio encoder comprises a neural network that applies mixture model (MiMo) attention to compute an attention probability distribution function (PDF) using a set of mixture components of softmaxes over a context window.

Public/Granted literature

US20220310074A1 Mixture Model Attention for Flexible Streaming and Non-Streaming Automatic Speech Recognition Public/Granted day:2022-09-29

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/16	..利用人工神经网络