Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering

    公开(公告)号:US20230089308A1

    公开(公告)日:2023-03-23

    申请号:US17644261

    申请日:2021-12-14

    Applicant: Google LLC

    Abstract: A method includes receiving an input audio signal that corresponds to utterances spoken by multiple speakers. The method also includes processing the input audio to generate a transcription of the utterances and a sequence of speaker turn tokens each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments based on the sequence of speaker tokens. The method also includes extracting a speaker-discriminative embedding from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes. The method also includes assigning a respective speaker label to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to the speaker segments clustered into each other class of the k classes.

Patent Agency Ranking