Invention Grant
- Patent Title: End-to-end speech diarization via iterative speaker embedding
-
Application No.: US17304514Application Date: 2021-06-22
-
Publication No.: US11887623B2Publication Date: 2024-01-30
- Inventor: David Grangier , Neil Zeghidour , Oliver Teboul
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant J. Griffith
- Main IPC: G10L25/78
- IPC: G10L25/78 ; G06N3/04 ; G10L15/06 ; G10L15/07 ; G10L17/18 ; G10L19/008

Abstract:
A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
Public/Granted literature
- US20220375492A1 End-To-End Speech Diarization Via Iterative Speaker Embedding Public/Granted day:2022-11-24
Information query