-
公开(公告)号:US20230419979A1
公开(公告)日:2023-12-28
申请号:US18046041
申请日:2022-10-12
Applicant: Samsung Electronics Co., Ltd.
Inventor: Myungjong Kim , Taeyeon Ki , Vijendra Raj Apsingekar , Sungjae Park , SeungBeom Ryu , Hyuk Oh
IPC: G10L21/028 , G10L17/06 , G10L17/02
CPC classification number: G10L21/028 , G10L17/06 , G10L17/02
Abstract: A method includes obtaining at least a portion of an audio stream containing speech activity. At least the portion of the audio stream includes multiple segments. The method also includes, for each of the multiple segments, generating an embedding vector that represents the segment. The method further includes, within each of multiple local windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Different clusters correspond to different speakers. The method also includes presenting at least one first sequence of speaker identities based on the speaker identification performed for the local windows. The method further includes, within each of multiple global windows, clustering the embedding vectors into one or more clusters to perform speaker identification. Each global window includes two or more local windows. In addition, the method includes presenting at least one second sequence of speaker identities based on the speaker identification performed for the global windows.