Invention Application
- Patent Title: END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
-
Application No.: PCT/US2021/030049Application Date: 2021-04-30
-
Publication No.: WO2021222678A1Publication Date: 2021-11-04
- Inventor: TRIPATHI, Anshuman , LU, Han , SAK, Hasim
- Applicant: GOOGLE LLC
- Applicant Address: 1600 Amphitheatre Parkway
- Assignee: GOOGLE LLC
- Current Assignee: GOOGLE LLC
- Current Assignee Address: 1600 Amphitheatre Parkway
- Agency: KRUEGER, Brett, A.
- Priority: US16/865,075 2020-05-01
- Main IPC: G10L15/06
- IPC: G10L15/06 ; G10L15/16 ; G06N3/08 ; G10L15/20
Abstract:
A method (400) for training a speech recognition model (200) with a loss function (310) includes receiving an audio signal (202) including a first segment (304) corresponding to audio spoken by a first speaker (10), a second segment corresponding to audio spoken by a second speaker, and an overlapping region (306) where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding (254) for each of the first and second speakers. The method also includes applying a masking loss (312) after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.
Information query