ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING

    公开(公告)号:US20230419979A1

    公开(公告)日:2023-12-28

    申请号:US18046041

    申请日:2022-10-12

    CPC classification number: G10L21/028 G10L17/06 G10L17/02

    Abstract: A method includes obtaining at least a portion of an audio stream containing speech activity. At least the portion of the audio stream includes multiple segments. The method also includes, for each of the multiple segments, generating an embedding vector that represents the segment. The method further includes, within each of multiple local windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Different clusters correspond to different speakers. The method also includes presenting at least one first sequence of speaker identities based on the speaker identification performed for the local windows. The method further includes, within each of multiple global windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Each global window includes two or more local windows. In addition, the method includes presenting at least one second sequence of speaker identities based on the speaker identification performed for the global windows.

Patent Agency Ranking