-
公开(公告)号:US20240304205A1
公开(公告)日:2024-09-12
申请号:US18224659
申请日:2023-07-21
IPC分类号: G10L21/0272 , G10L15/26 , G10L25/78
CPC分类号: G10L21/0272 , G10L15/26 , G10L25/78
摘要: A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.