ONLINE SPEAKER DIARIZATION USING LOCAL AND GLOBAL CLUSTERING

    公开(公告)号:US20230419979A1

    公开(公告)日:2023-12-28

    申请号:US18046041

    申请日:2022-10-12

    CPC classification number: G10L21/028 G10L17/06 G10L17/02

    Abstract: A method includes obtaining at least a portion of an audio stream containing speech activity. At least the portion of the audio stream includes multiple segments. The method also includes, for each of the multiple segments, generating an embedding vector that represents the segment. The method further includes, within each of multiple local windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Different clusters correspond to different speakers. The method also includes presenting at least one first sequence of speaker identities based on the speaker identification performed for the local windows. The method further includes, within each of multiple global windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Each global window includes two or more local windows. In addition, the method includes presenting at least one second sequence of speaker identities based on the speaker identification performed for the global windows.

    SYSTEM AND METHOD FOR SPEAKER VERIFICATION FOR VOICE ASSISTANT

    公开(公告)号:US20230419962A1

    公开(公告)日:2023-12-28

    申请号:US18047609

    申请日:2022-10-18

    CPC classification number: G10L15/22 G10L2015/088 G10L15/08

    Abstract: A method includes obtaining audio data and identifying an utterance of a wake word or phrase in the audio data. The method also includes generating an embedding vector based on the utterance from the audio data and accessing a set of previously-generated vectors representing previous utterances of the wake word or phrase. The method further includes performing clustering on the embedding vector and the set of previously-generated vectors to identify a cluster including the embedding vector, where the identified cluster is associated with a speaker. The method also includes updating a speaker vector associated with the speaker based on the embedding vector and determining, using a speaker verification model, a similarity score between the updated speaker vector and the embedding vector. In addition, the method includes determining, based on the similarity score, whether a speaker providing the utterance matches the speaker associated with the identified cluster.

Patent Agency Ranking