METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

    公开(公告)号:US20220223144A1

    公开(公告)日:2022-07-14

    申请号:US17611121

    申请日:2020-05-13

    Abstract: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

    IMPROVING NOISE COMPENSATION IN MASK-BASED SPEECH ENHANCEMENT

    公开(公告)号:US20250054508A1

    公开(公告)日:2025-02-13

    申请号:US18705446

    申请日:2022-11-07

    Abstract: Methods and apparatus for improving noise compensation in mask-based speech enhancement are described. A method of processing an audio signal, which includes one or more speech segments, includes obtaining a mask for mask-based speech enhancement of the audio signal and obtaining a magnitude of the audio signal. An estimate of residual noise is determined in the audio signal after mask-based speech enhancement, based on the mask and the magnitude of the audio signal. A modified mask is determined based on the estimate of the residual noise. Further described are corresponding programs and computer-readable storage media.

    METHOD FOR NEURAL NETWORK TRAINING WITH MULTIPLE SUPERVISORS

    公开(公告)号:US20250045585A1

    公开(公告)日:2025-02-06

    申请号:US18716895

    申请日:2022-12-08

    Abstract: The present disclosure relates to a method for designing a processor (20) and a computer implemented neural network. The method comprises obtaining input data and corresponding ground truth target data and providing the input data to a processor (20) for outputting a first prediction of target data given the input data. The method further comprises providing the latent variables output by a processor module (21: 1, 21: 2, . . . 21: n−1) to a supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) which outputs a second prediction of target data based on latent variables and determining a first and second loss measure by comparing the predictions of target data with the ground truth target data. The method further comprises training the processor (20) and the supervisor module (22: 1, 22: 2, 22: 3, . . . 22: n−1) based on the first and second loss measure and adjusting the processor by at least one of removing, replacing and adding a processor module.

    SOURCE SEPARATION AND REMIXING IN SIGNAL PROCESSING

    公开(公告)号:US20250046328A1

    公开(公告)日:2025-02-06

    申请号:US18709129

    申请日:2022-10-26

    Abstract: The present disclosure relates to a method and audio processing system (1) for performing source separation. The method comprises obtaining (S1) an audio signal (Sin) including a mixture of speech content and noise content, determining (S2a, S2b, S2c), from the audio signal, speech content (formula A), stationary noise content (formula C) and non-speech content (formula B). The stationary noise content (formula C) is a true subset of the non-speech content (formula B) and the method further comprises determining (S3), based on a difference between the stationary noise content (formula C) and the non-speech content (formula B) a non-stationary noise content formula D), obtaining (S5) a set of weighting factors and forming (S6) a processed audio signal based on a combination of the speech content (formula A), the stationary noise content (formula C), and the non-stationary noise content (formula D) weighted with their respective weighting factor. (Ŝ1)  formula A ({circumflex over (N)}1)  formula B ({circumflex over (N)}2)  formula C ({circumflex over (N)}NS)  formula D

    DETERMINING DIALOG QUALITY METRICS OF A MIXED AUDIO SIGNAL

    公开(公告)号:US20240071411A1

    公开(公告)日:2024-02-29

    申请号:US18259848

    申请日:2022-01-04

    CPC classification number: G10L25/60 G10L21/0272

    Abstract: Disclosed is a method for determining one or more dialog quality metrics of a mixed audio signal comprising a dialog component and a noise component, the method comprising separating an estimated dialog component from the mixed audio signal by means of a dialog separator using a dialog separating model determined by training the dialog separator based on the one or more quality metrics; providing the estimated dialog component from the dialog separator to a quality metrics estimator; and determining the one or more quality metrics by means of the quality metrics estimator based on the mixed signal and the estimated dialog component. Further disclosed is a method for training a dialog separator, a system comprising circuitry configured to perform the method, and a non-transitory computer-readable storage medium.

Patent Agency Ranking