-
公开(公告)号:US11887623B2
公开(公告)日:2024-01-30
申请号:US17304514
申请日:2021-06-22
Applicant: Google LLC
Inventor: David Grangier , Neil Zeghidour , Oliver Teboul
CPC classification number: G10L25/78 , G06N3/04 , G10L15/063 , G10L15/07 , G10L17/18 , G10L19/008
Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
-
公开(公告)号:US20240144957A1
公开(公告)日:2024-05-02
申请号:US18544647
申请日:2023-12-19
Applicant: Google LLC
Inventor: David Grangier , Neil Zeghidour , Oliver Teboul
CPC classification number: G10L25/78 , G06N3/04 , G10L15/063 , G10L15/07 , G10L17/18 , G10L19/008
Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
-
公开(公告)号:US20220375492A1
公开(公告)日:2022-11-24
申请号:US17304514
申请日:2021-06-22
Applicant: Google LLC
Inventor: David Grangier , Neil Zeghidour , Oliver Teboul
Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.
-
-