Using corrections, of predicted textual segments of spoken utterances, for training of on-device speech recognition model

发明授权

US11817080B2 Using corrections, of predicted textual segments of spoken utterances, for training of on-device speech recognition model 有权

请登陆查看更多内容

专利标题： Using corrections, of predicted textual segments of spoken utterances, for training of on-device speech recognition model
申请号： US17250165

申请日： 2019-10-11
公开(公告)号： US11817080B2

公开(公告)日： 2023-11-14
发明人: Françoise Beaufays , Johan Schalkwyk , Giovanni Motta
申请人： Google LLC
申请人地址： US CA Mountain View
专利权人： GOOGLE LLC
当前专利权人： GOOGLE LLC
当前专利权人地址： US CA Mountain View
代理机构： Gray Ice Higdon
国际申请： PCT/US2019/055901 2019.10.11
国际公布： WO2021/045793A 2021.03.11
进入国家日期： 2020-12-07
主分类号： G10L15/00
IPC分类号： G10L15/00 ; G06F3/04842 ; G06F3/04883 ; G10L25/51

Using corrections, of predicted textual segments of spoken utterances, for training of on-device speech recognition model

摘要：

Processor(s) of a client device can: receive audio data that captures a spoken utterance of a user of the client device; process, using an on-device speech recognition model, the audio data to generate a predicted textual segment that is a prediction of the spoken utterance; cause at least part of the predicted textual segment to be rendered (e.g., visually and/or audibly); receive further user interface input that is a correction of the predicted textual segment to an alternate textual segment; and generate a gradient based on comparing at least part of the predicted output to ground truth output that corresponds to the alternate textual segment. The gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model and/or is transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

公开/授权文献

US20210327410A1 USING CORRECTIONS, OF PREDICTED TEXTUAL SEGMENTS OF SPOKEN UTTERANCES, FOR TRAINING OF ON-DEVICE SPEECH RECOGNITION MODEL 公开/授权日：2021-10-21

信息查询

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）