-
1.
公开(公告)号:US20220223144A1
公开(公告)日:2022-07-14
申请号:US17611121
申请日:2020-05-13
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai SUN , Zhiwei SHUANG , Lie LU , Shaofan YANG , Jia DAI
Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.
-
公开(公告)号:US20250054508A1
公开(公告)日:2025-02-13
申请号:US18705446
申请日:2022-11-07
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai SUN , Zhiwei SHUANG
IPC: G10L21/0208 , G10L15/04 , G10L25/51 , G10L25/78
Abstract: Methods and apparatus for improving noise compensation in mask-based speech enhancement are described. A method of processing an audio signal, which includes one or more speech segments, includes obtaining a mask for mask-based speech enhancement of the audio signal and obtaining a magnitude of the audio signal. An estimate of residual noise is determined in the audio signal after mask-based speech enhancement, based on the mask and the magnitude of the audio signal. A modified mask is determined based on the estimate of the residual noise. Further described are corresponding programs and computer-readable storage media.
-
公开(公告)号:US20250045585A1
公开(公告)日:2025-02-06
申请号:US18716895
申请日:2022-12-08
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai SUN , Lie LU , Zhiwei SHUANG , Yuanxing MA
IPC: G06N3/082
Abstract: The present disclosure relates to a method for designing a processor (20) and a computer implemented neural network. The method comprises obtaining input data and corresponding ground truth target data and providing the input data to a processor (20) for outputting a first prediction of target data given the input data. The method further comprises providing the latent variables output by a processor module (21: 1, 21: 2, . . . 21: n−1) to a supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) which outputs a second prediction of target data based on latent variables and determining a first and second loss measure by comparing the predictions of target data with the ground truth target data. The method further comprises training the processor (20) and the supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) based on the first and second loss measure and adjusting the processor by at least one of removing, replacing and adding a processor module.
-
公开(公告)号:US20250046328A1
公开(公告)日:2025-02-06
申请号:US18709129
申请日:2022-10-26
Applicant: DOLBY LABORATORIES LICENSING CORPORATION
Inventor: Jundai SUN , Zhiwei SHUANG , Yuanxing MA
IPC: G10L21/028 , G10L25/84 , G10L25/93
Abstract: The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor. (Ŝ1) formula A ({circumflex over (N)}1) formula B ({circumflex over (N)}2) formula C ({circumflex over (N)}NS) formula D
-
公开(公告)号:US20240071411A1
公开(公告)日:2024-02-29
申请号:US18259848
申请日:2022-01-04
Applicant: Dolby Laboratories Licensing Corporation
Inventor: Jundai SUN , Lie LU , Shaofan YANG , Rhonda J. WILSON , Dirk Jeroen BREEBAART
IPC: G10L25/60 , G10L21/0272
CPC classification number: G10L25/60 , G10L21/0272
Abstract: Disclosed is a method for determining one or more dialog quality metrics of a mixed audio signal comprising a dialog component and a noise component, the method comprising separating an estimated dialog component from the mixed audio signal by means of a dialog separator using a dialog separating model determined by training the dialog separator based on the one or more quality metrics; providing the estimated dialog component from the dialog separator to a quality metrics estimator; and determining the one or more quality metrics by means of the quality metrics estimator based on the mixed signal and the estimated dialog component. Further disclosed is a method for training a dialog separator, a system comprising circuitry configured to perform the method, and a non-transitory computer-readable storage medium.
-
-
-
-