END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION

Invention Application

WO2021222678A1 END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION 审中-公开

Please log in to see more content

Patent Title: END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
Application No.: PCT/US2021/030049

Application Date: 2021-04-30
Publication No.: WO2021222678A1

Publication Date: 2021-11-04
Inventor: TRIPATHI, Anshuman , LU, Han , SAK, Hasim
Applicant: GOOGLE LLC
Applicant Address: 1600 Amphitheatre Parkway
Assignee: GOOGLE LLC
Current Assignee: GOOGLE LLC
Current Assignee Address: 1600 Amphitheatre Parkway
Agency: KRUEGER, Brett, A.
Priority: US16/865,075 2020-05-01
Main IPC: G10L15/06
IPC: G10L15/06 ; G10L15/16 ; G06N3/08 ; G10L15/20

END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION

Abstract:

A method (400) for training a speech recognition model (200) with a loss function (310) includes receiving an audio signal (202) including a first segment (304) corresponding to audio spoken by a first speaker (10), a second segment corresponding to audio spoken by a second speaker, and an overlapping region (306) where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding (254) for each of the first and second speakers. The method also includes applying a masking loss (312) after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L15/00	语音识别（G10L17/00优先）
G10L15/06	.创建基准模板；训练语音识别系统，例如对说话者声音特征的适应（G10L15/14优先）