-
1.
公开(公告)号:US20240022224A1
公开(公告)日:2024-01-18
申请号:US18253850
申请日:2021-11-18
Inventor: Giulio CENGARLE , Nicholas Laurence ENGEL , Patrick Winfrey SCANNELL , Davide SCAINI
CPC classification number: H03G5/005 , H04R3/04 , G10L25/21 , G10L25/18 , G10L15/1815 , H04R2430/03
Abstract: In an embodiment, a method comprises: filtering reference audio content items to separate the reference audio content items into different frequency bands; for each frequency band, extracting a first feature vector from at least a portion of each of the reference audio content items, wherein the first feature vector includes at least one audio characteristic of the reference audio content items; obtaining at least one semantic label from at least a portion of each of the reference audio content items; obtaining a second feature vector consisting of the first feature vectors per frequency band and the at least one semantic label; generating, based on the second feature vector, cluster feature vectors representing centroids of clusters; separating the reference audio content items according to the cluster feature vectors; and computing an average target profile for each cluster based on the reference audio content items in the cluster.
-
公开(公告)号:US20240160849A1
公开(公告)日:2024-05-16
申请号:US18550429
申请日:2022-04-27
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Andrea FANELLI , Mingqing YUN , Satej Suresh PANKEY , Nicholas Laurence ENGEL , Poppy Anne Carrie Crum
IPC: G06F40/30
CPC classification number: G06F40/30
Abstract: Embodiments are disclosed for speaker diarization supporting episodical content. In an embodiment, a method comprises: receiving media data including one or more utterances; dividing the media data into a plurality of blocks; identifying segments of each block of the plurality of blocks associated with a single speaker; extracting embeddings for the identified segments in accordance with a machine learning model, wherein extracting embeddings for identified segments further comprises statistically combining extracted embeddings for identified segments that correspond to a respective continuous utterance associated with a single speaker; clustering the embeddings for the identified segments into clusters; and assigning a speaker label to each of the embeddings for the identified segments in accordance with a result of the clustering. In some embodiments, a voiceprint is used to identify a speaker and the speaker identity for a speaker label.
-