Invention Application
- Patent Title: SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING
-
Application No.: PCT/US2021/063343Application Date: 2021-12-14
-
Publication No.: WO2023048746A1Publication Date: 2023-03-30
- Inventor: WANG, Quan , LU, Han , CLARK, Evan , MORENNO, Ignacio, Lopez , SAK, Hasim , XU, Wei , JOGLEKAR, Taral , TRIPATHI, Anshuman
- Applicant: GOOGLE LLC
- Applicant Address: 1600 Amphitheatre Parkway
- Assignee: GOOGLE LLC
- Current Assignee: GOOGLE LLC
- Current Assignee Address: 1600 Amphitheatre Parkway
- Agency: KRUEGER, Brett, A.
- Priority: US63/261,536 2021-09-23
- Main IPC: G10L21/0272
- IPC: G10L21/0272 ; G10L15/16 ; G10L25/30
Abstract:
A method (400) includes receiving an input audio signal (122) that corresponds to utterances (120) spoken by multiple speakers (10). The method also includes processing the input audio to generate a transcription (120) of the utterances and a sequence of speaker turn tokens (224) each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments (225) based on the sequence of speaker turn tokens. The method also includes extracting a speaker-discriminative embedding (240) from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes (262). The method also includes assigning a respective speaker label (250) to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to speaker segments clustered into each other class of the k classes.
Information query