-
1.
公开(公告)号:US12073828B2
公开(公告)日:2024-08-27
申请号:US17611121
申请日:2020-05-13
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai Sun , Zhiwei Shuang , Lie Lu , Shaofan Yang , Jia Dai
Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.