SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING

Invention Application

WO2023048746A1 SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING 审中-公开

Please log in to see more content

Patent Title: SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING
Application No.: PCT/US2021/063343

Application Date: 2021-12-14
Publication No.: WO2023048746A1

Publication Date: 2023-03-30
Inventor: WANG, Quan , LU, Han , CLARK, Evan , MORENNO, Ignacio, Lopez , SAK, Hasim , XU, Wei , JOGLEKAR, Taral , TRIPATHI, Anshuman
Applicant: GOOGLE LLC
Applicant Address: 1600 Amphitheatre Parkway
Assignee: GOOGLE LLC
Current Assignee: GOOGLE LLC
Current Assignee Address: 1600 Amphitheatre Parkway
Agency: KRUEGER, Brett, A.
Priority: US63/261,536 2021-09-23
Main IPC: G10L21/0272
IPC: G10L21/0272 ; G10L15/16 ; G10L25/30

SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING

Abstract:

A method (400) includes receiving an input audio signal (122) that corresponds to utterances (120) spoken by multiple speakers (10). The method also includes processing the input audio to generate a transcription (120) of the utterances and a sequence of speaker turn tokens (224) each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments (225) based on the sequence of speaker turn tokens. The method also includes extracting a speaker-discriminative embedding (240) from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes (262). The method also includes assigning a respective speaker label (250) to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to speaker segments clustered into each other class of the k classes.

Information query

Global Dossier Patent Scope Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L21/00	为了改变语音或声音信号的质量或其可识度而处理语音或声音信号，以产生另一种可听的或非可听的信号，例如视觉信号或触觉信号（G10L19/00优先）
G10L21/02	.语音增强，例如降低噪声或消除回声（在直线传送系统中减轻回声效应入H04B3/20；免提电话中的回声抑制入H04M9/08）
G10L21/0272	..声音信号的分离