End-to-end speech diarization via iterative speaker embedding

Invention Grant

US11887623B2 End-to-end speech diarization via iterative speaker embedding 有权

Please log in to see more content

Patent Title: End-to-end speech diarization via iterative speaker embedding
Application No.: US17304514

Application Date: 2021-06-22
Publication No.: US11887623B2

Publication Date: 2024-01-30
Inventor: David Grangier , Neil Zeghidour , Oliver Teboul
Applicant: Google LLC
Applicant Address: US CA Mountain View
Assignee: Google LLC
Current Assignee: Google LLC
Current Assignee Address: US CA Mountain View
Agency: Honigman LLP
Agent Brett A. Krueger; Grant J. Griffith
Main IPC: G10L25/78
IPC: G10L25/78 ; G06N3/04 ; G10L15/06 ; G10L15/07 ; G10L17/18 ; G10L19/008

End-to-end speech diarization via iterative speaker embedding

Abstract:

A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

Public/Granted literature

US20220375492A1 End-To-End Speech Diarization Via Iterative Speaker Embedding Public/Granted day:2022-11-24

Information query

Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L25/00	不限于组G10L 15/00-G10L 21/00的语言或者声音分析技术(当利用语音检测器来感知一些信号特殊特征的基于半导体的静噪放大器，如无信号时的感知入H03G3/34)
G10L25/78	.语音信号存在或不存在的检测（在双向扩音电话系统中通过语音频率切换传输的方向入H04M9/10）