Multi-Task Learning for End-To-End Automated Speech Recognition Confidence and Deletion Estimation

Invention Application

US20220310080A1 Multi-Task Learning for End-To-End Automated Speech Recognition Confidence and Deletion Estimation 有权

Please log in to see more content

Patent Title: Multi-Task Learning for End-To-End Automated Speech Recognition Confidence and Deletion Estimation
Application No.: US17643826

Application Date: 2021-12-11
Publication No.: US20220310080A1

Publication Date: 2022-09-29
Inventor: David Qiu , Yanzhang He , Yu Zhang , Qiujia Li , Liangliang Cao , Ian McGraw
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Main IPC: G10L15/197
IPC: G10L15/197 ; G10L15/06 ; G10L15/22 ; G10L15/02 ; G10L15/16 ; G10L15/30 ; G10L15/32 ; G10L15/04 ; G06N3/08

Multi-Task Learning for End-To-End Automated Speech Recognition Confidence and Deletion Estimation

Abstract:

A method including receiving a speech recognition result corresponding to a transcription of an utterance spoken by a user. For each sub-word unit in a sequence of hypothesized sub-word units of the speech recognition result, using a confidence estimation module to: obtain a respective confidence embedding associated with the corresponding output step when the corresponding sub-word unit was output from the first speech recognizer; generate a confidence feature vector; generate an acoustic context vector; and generate a respective confidence output score for the corresponding sub-word unit based on the confidence feature vector and the acoustic feature vector received as input by the output layer of the confidence estimation module. The method also includes determining, based on the respective confidence output score generated for each sub-word unit in the sequence of hypothesized sub-word units, an utterance-level confidence score for the transcription of the utterance.

Information query

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/08	.语音分类或检索
G10L15/18	..利用自然语言模型
G10L15/183	...用上下文相关性，例如：语言模型
G10L15/19	....语法上下文，例如：基于字母顺序规则的识别假定的消除二义性
G10L15/197	.....概率文法，例如：字元语法