专利检索 ap:("Mitsubishi Electric Research Laboratories, Inc.") AND inv:"Christoph Böddeker" 第 1 页

1.

发明公开
System and Method for Audio Processing using Time-Invariant Speaker Embeddings 审中-公开

公开(公告)号：US20240304205A1

公开(公告)日：2024-09-12

申请号：US18224659

申请日：2023-07-21

申请人： Mitsubishi Electric Research Laboratories, Inc.

发明人： Aswin Shanmugam Subramanian , Christoph Böddeker , Gordon Wichern , Jonathan Le Roux

IPC分类号： G10L21/0272 , G10L15/26 , G10L25/78

CPC分类号： G10L21/0272 , G10L15/26 , G10L25/78

摘要： A system and method for sound processing for performing multi-talker conversation analysis is provided. The sound processing system includes a deep neural network trained for processing audio segments of an audio mixture of the multi-talker conversation. The deep neural network includes a speaker-independent layer that produces a speaker-independent output, and a speaker-biased layer applied once independently to each of the audio segments for each multiple speakers of the audio mixture. The deep neural network also processes a time-invariant embedding by individually assigning each application of the speaker-biased layer to a corresponding speaker by inputting the corresponding time-invariant speaker embedding. The deep neural network thus produces data indicative of time-frequency activity regions of each speaker of the multiple speakers in the audio mixture from a combination of speaker-biased outputs.