ADL-UFE: all deep learning unified front-end system

    公开(公告)号:US12094481B2

    公开(公告)日:2024-09-17

    申请号:US17455497

    申请日:2021-11-18

    CPC分类号: G10L21/0208

    摘要: There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.

    Unified deep neural network model for acoustic echo cancellation and residual echo suppression

    公开(公告)号:US11776556B2

    公开(公告)日:2023-10-03

    申请号:US17485943

    申请日:2021-09-27

    发明人: Meng Yu Dong Yu

    摘要: A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

    Multi-band synchronized neural vocoder

    公开(公告)号:US11295751B2

    公开(公告)日:2022-04-05

    申请号:US16576943

    申请日:2019-09-20

    IPC分类号: G10L19/00 G10L19/16 G06N3/02

    摘要: An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

    Multi-look enhancement modeling and application for keyword spotting

    公开(公告)号:US11410652B2

    公开(公告)日:2022-08-09

    申请号:US16921161

    申请日:2020-07-06

    发明人: Meng Yu Dong Yu

    摘要: A method, computer system, and computer readable medium are provided for activating speech recognition based on keyword spotting (KWS). Waveform data corresponding to one or more speakers is received. One or more direction features are extracted from the received waveform data. One or more keywords are determined from the received waveform data based on the one or more extracted features. Speech recognition is activated based on detecting the determined keyword.

    TECHNIQUES FOR UNIFIED ACOUSTIC ECHO SUPPRESSION USING A RECURRENT NEURAL NETWORK

    公开(公告)号:US20230403505A1

    公开(公告)日:2023-12-14

    申请号:US17840188

    申请日:2022-06-14

    IPC分类号: H04R3/02

    CPC分类号: H04R3/02

    摘要: A method of acoustic echo suppression using a recurrent neural network, performed by at least one processor, is provided. The method includes receiving a microphone signal and a far-end reference signal, estimating an echo suppressed signal and an echo signal based on the microphone signal and the far-end reference signal, estimating enhancement filters for the microphone signal based on the echo suppressed signal and the echo signal, generating an enhanced signal based on the enhancement filters, and adjusting the enhanced signal using automatic gain control (AGC) and outputting the adjusted signal.