METHOD AND APPARATUS FOR SPEECH SOURCE SEPARATION BASED ON A CONVOLUTIONAL NEURAL NETWORK

    公开(公告)号:US20220223144A1

    公开(公告)日:2022-07-14

    申请号:US17611121

    申请日:2020-05-13

    摘要: Described herein is a method for Convolutional Neural Network (CNN) based speech source separation, wherein the method includes the steps of: (a) providing multiple frames of a time-frequency transform of an original noisy speech signal; (b) inputting the time-frequency transform of said multiple frames into an aggregated multi-scale CNN having a plurality of parallel convolution paths; (c) extracting and outputting, by each parallel convolution path, features from the input time-frequency transform of said multiple frames; (d) obtaining an aggregated output of the outputs of the parallel convolution paths; and (e) generating an output mask for extracting speech from the original noisy speech signal based on the aggregated output. Described herein are further an apparatus for CNN based speech source separation as well as a respective computer program product comprising a computer-readable storage medium with instructions adapted to carry out said method when executed by a device having processing capability.

    PROCESSING OBJECT-BASED AUDIO SIGNALS
    3.
    发明申请

    公开(公告)号:US20190222951A1

    公开(公告)日:2019-07-18

    申请号:US16368574

    申请日:2019-03-28

    IPC分类号: H04S7/00 H04S3/00

    摘要: An audio processing system and method which calculates, based on spatial metadata of the audio object, a panning coefficient for each of the audio objects in relation to each of a plurality of predefined channel coverage zones. Converts the audio signal into submixes in relation to the predefined channel coverage zones based on the calculated panning coefficients and the audio objects. Each of the submixes indicating a sum of components of the plurality of the audio objects in relation to one of the predefined channel coverage zones. Generating a submix gain by applying an audio processing to each of the submix and controls an object gain applied to each of the audio objects. The object gain being as a function of the panning coefficients for each of the audio objects and the submix gains in relation to each of the predefined channel coverage zones.

    UPMIXING OF AUDIO SIGNALS
    5.
    发明申请

    公开(公告)号:US20180262856A1

    公开(公告)日:2018-09-13

    申请号:US15538892

    申请日:2016-02-09

    IPC分类号: H04S5/00 H04R1/32 H04S7/00

    摘要: Example embodiments disclosed herein relates to upmixing of audio signals. A method of upmixing an audio signal is described. The method includes decomposing the audio signal into a diffuse signal and a direct signal, generating an audio bed at least in part based on the diffuse signal, the audio bed including a height channel, extracting an audio object from the direct signal, estimating metadata of the audio object, the metadata including height information of the audio object; and rendering the audio bed and the audio object as an upmixed audio signal, wherein the audio bed is rendered to a predefined position and the audio object is rendered according to the metadata. Corresponding system and computer program product are described as well.

    PROCESSING OBJECT-BASED AUDIO SIGNALS
    6.
    发明申请

    公开(公告)号:US20180152803A1

    公开(公告)日:2018-05-31

    申请号:US15577510

    申请日:2016-05-26

    IPC分类号: H04S7/00 H04S3/00

    摘要: An audio processing system and method which calculates, based on spatial metadata of the audio object, a panning coefficient for each of the audio objects in relation to each of a plurality of predefined channel coverage zones. Converts the audio signal into submixes in relation to the predefined channel coverage zones based on the calculated panning coefficients and the audio objects. Each of the submixes indicating a sum of components of the plurality of the audio objects in relation to one of the predefined channel coverage zones. Generating a submix gain by applying an audio processing to each of the submix and controls an object gain applied to each of the audio objects. The object gain being as a function of the panning coefficients for each of the audio objects and the submix gains in relation to each of the predefined channel coverage zones.

    AUDIO SOURCE SEPARATION WITH SOURCE DIRECTION DETERMINATION BASED ON ITERATIVE WEIGHTING

    公开(公告)号:US20180144759A1

    公开(公告)日:2018-05-24

    申请号:US15572067

    申请日:2016-05-12

    发明人: Lie LU Mingqing HU

    摘要: Example embodiments disclosed herein relate to audio source separation with source direction determined based on iterative weighted component analysis. A method of separating audio sources in audio content is disclosed. The audio content includes a plurality of channels. The method includes obtaining multiple data samples from multiple time-frequency tiles of the audio content. The method also includes analyzing the data samples to generate multiple components in a plurality of iterations, wherein each of the components indicates a direction with a variance of the data samples, and wherein in each of the plurality of iterations, each of the data samples is weighted with a weight that is determined based on a selected component from the multiple components. The method further includes determining a source direction of the audio content based on the selected component for separating an audio source from the audio content. Corresponding system and computer program product of separating audio sources in audio content are also disclosed.