ONLINE DEREVERBERATION ALGORITHM BASED ON WEIGHTED PREDICTION ERROR FOR NOISY TIME-VARYING ENVIRONMENTS

    公开(公告)号:WO2018119470A1

    公开(公告)日:2018-06-28

    申请号:PCT/US2017/068362

    申请日:2017-12-22

    Abstract: Systems and methods for processing multichannel audio signals include receiving a multichannel time-domain audio input, transforming the input signal to plurality of multi-channel frequency domain, k-spaced under-sampled subband signals, buffering and delaying each channel, saving a subset of spectral frames for prediction filter estimation at each of the spectral frames, estimating a variance of the frequency domain signal at each of the spectral frames, adaptively estimating the prediction filter in an online manner using a recursive least squares (RLS) algorithm, linearly filtering each channel using the estimated prediction filter, nonlinearly filtering the linearly filtered output signal to reduce residual reverberation and the estimated variances, producing a nonlinearly filtered output signal, and synthesizing the nonlinearly filtered output signal to reconstruct a dereverberated time-domain multi-channel audio signal.

    VOICE ENHANCEMENT IN AUDIO SIGNALS THROUGH MODIFIED GENERALIZED EIGENVALUE BEAMFORMER

    公开(公告)号:WO2019113253A1

    公开(公告)日:2019-06-13

    申请号:PCT/US2018/064133

    申请日:2018-12-05

    Abstract: A real-time audio signal processing system includes an audio signal processor configured to process audio signals using a modified generalized eigenvalue (GEV) beamforming technique to generate an enhanced target audio output signal. The digital signal processor includes a sub-band decomposition circuitry configured to decompose the audio signal into sub-band frames in the frequency domain and a target activity detector configured to detect whether a target audio is present in the sub-band frames. Based on information related to the sub-band frames and the determination of whether the target audio is present in the sub-band frames, the digital signal processor is configured to use the modified GEV technique to estimate the relative transfer function (RTF) of the target audio source, and generate a filter based on the estimated RTF. The filter may then be applied to the audio signals to generate the enhanced audio output signal.

    VOICE ACTIVITY DETECTION SYSTEMS AND METHODS

    公开(公告)号:WO2019113130A1

    公开(公告)日:2019-06-13

    申请号:PCT/US2018/063937

    申请日:2018-12-04

    Abstract: An audio processing device or method includes an audio transducer operable to receive audio input and generate an audio signal based on the audio input. The audio processing device or method also includes an audio signal processor operable to extract local features from the audio signal, such as Power-Normalized Coefficients (PNCC) of the audio signal. The audio signal processor also is operable to extract global features from the audio signal, such as chroma features and harmonicity features. A neural network is provided to determine a probability that a target audio is present in the audio signal based on the local and global features. In particular, the neural network is trained to output a value indicating whether the target audio is present and locally dominant in the audio signal.

    MULTIPLE INPUT MULTIPLE OUTPUT (MIMO) AUDIO SIGNAL PROCESSING FOR SPEECH DE-REVERBERATION

    公开(公告)号:WO2018119467A1

    公开(公告)日:2018-06-28

    申请号:PCT/US2017/068358

    申请日:2017-12-22

    Abstract: Audio signal processing for adaptive de-reverberation uses a least mean squares (LMS) filter that has improved convergence over conventional LMS filters, making embodiments practical for reducing the effects of reverberation for use in many portable and embedded devices, such as smartphones, tablets, laptops, and hearing aids, for applications such as speech recognition and audio communication in general. The LMS filter employs a frequency-dependent adaptive step size to speed up the convergence of the predictive filter process, requiring fewer computational steps compared to a conventional LMS filter applied to the same inputs. The improved convergence is achieved at low memory consumption cost. Controlling the updates of the prediction filter in a high non-stationary condition of the acoustic channel improves the performance under such conditions. The techniques are suitable for single or multiple channels and are applicable to microphone array processing.

    RECURRENT MULTIMODAL ATTENTION SYSTEM BASED ON EXPERT GATED NETWORKS

    公开(公告)号:WO2019222759A1

    公开(公告)日:2019-11-21

    申请号:PCT/US2019/033178

    申请日:2019-05-20

    Abstract: Systems and methods for multimodal classification include a plurality of expert modules, each expert module configured to receive data corresponding to one of a plurality of input modalities and extract associated features, a plurality of class prediction modules, each class prediction module configured to receive extracted features from a corresponding one of the expert modules and predict an associated class, a gate expert configured to receive the extracted features from the plurality of expert modules and output a set of weights for the input modalities, and a fusion module configured to generate a weighted prediction based on the class predictions and the set of weights. Various embodiments include one or more of an image expert, a video expert, an audio expert, class prediction modules, a gate expert, and a co-learning framework.

Patent Agency Ranking