ADL-UFE: all deep learning unified front-end system

    公开(公告)号:US12094481B2

    公开(公告)日:2024-09-17

    申请号:US17455497

    申请日:2021-11-18

    CPC分类号: G10L21/0208

    摘要: There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.