-
公开(公告)号:US20230089308A1
公开(公告)日:2023-03-23
申请号:US17644261
申请日:2021-12-14
Applicant: Google LLC
Inventor: Quan Wang , Han Lu , Evan Clark , Ignacio Lopez Moreno , Hasim Sak , Wei Xia , Taral Joglekar , Anshuman Tripathi
Abstract: A method includes receiving an input audio signal that corresponds to utterances spoken by multiple speakers. The method also includes processing the input audio to generate a transcription of the utterances and a sequence of speaker turn tokens each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments based on the sequence of speaker tokens. The method also includes extracting a speaker-discriminative embedding from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes. The method also includes assigning a respective speaker label to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to the speaker segments clustered into each other class of the k classes.