SPEAKER-TURN-BASED ONLINE SPEAKER DIARIZATION WITH CONSTRAINED SPECTRAL CLUSTERING
Abstract:
A method (400) includes receiving an input audio signal (122) that corresponds to utterances (120) spoken by multiple speakers (10). The method also includes processing the input audio to generate a transcription (120) of the utterances and a sequence of speaker turn tokens (224) each indicating a location of a respective speaker turn. The method also includes segmenting the input audio signal into a plurality of speaker segments (225) based on the sequence of speaker turn tokens. The method also includes extracting a speaker-discriminative embedding (240) from each speaker segment and performing spectral clustering on the speaker-discriminative embeddings to cluster the plurality of speaker segments into k classes (262). The method also includes assigning a respective speaker label (250) to each speaker segment clustered into the respective class that is different than the respective speaker label assigned to speaker segments clustered into each other class of the k classes.
Patent Agency Ranking
0/0