-
公开(公告)号:US12094481B2
公开(公告)日:2024-09-17
申请号:US17455497
申请日:2021-11-18
申请人: TENCENT AMERICA LLC
发明人: Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu
IPC分类号: G10L21/0208 , G06N3/044 , G06N3/08 , G10L21/0216 , G10L21/0264 , G10L25/30
CPC分类号: G10L21/0208
摘要: There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.
-
公开(公告)号:US11902757B2
公开(公告)日:2024-02-13
申请号:US17840188
申请日:2022-06-14
申请人: Tencent America LLC
发明人: Meng Yu , Yong Xu , Chunlei Zhang , Shi-Xiong Zhang , Dong Yu
摘要: A method of acoustic echo suppression using a recurrent neural network, performed by at least one processor, is provided. The method includes receiving a microphone signal and a far-end reference signal, estimating an echo suppressed signal and an echo signal based on the microphone signal and the far-end reference signal, estimating enhancement filters for the microphone signal based on the echo suppressed signal and the echo signal, generating an enhanced signal based on the enhancement filters, and adjusting the enhanced signal using automatic gain control (AGC) and outputting the adjusted signal.
-
公开(公告)号:US11380307B2
公开(公告)日:2022-07-05
申请号:US17038498
申请日:2020-09-30
申请人: TENCENT AMERICA LLC
发明人: Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu
摘要: A method, computer program, and computer system is provided for automated speech recognition. Audio data corresponding to one or more speakers is received. Covariance matrices of target speech and noise associated with the received audio data are estimated based on a gated recurrent unit-based network. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated by a minimum variance distortionless response function based on the estimated covariance matrices.
-
公开(公告)号:US20210390970A1
公开(公告)日:2021-12-16
申请号:US16901487
申请日:2020-06-15
申请人: TENCENT AMERICA LLC
发明人: Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu
摘要: A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.
-
公开(公告)号:US11688412B2
公开(公告)日:2023-06-27
申请号:US16901487
申请日:2020-06-15
申请人: TENCENT AMERICA LLC
发明人: Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu
IPC分类号: G06V40/16 , G10L21/0272 , G10L17/00 , G06T11/60 , G06T7/20 , G06N3/02 , G06T7/00 , G06V20/40
CPC分类号: G10L21/0272 , G06N3/02 , G06T7/0012 , G06T7/20 , G06T11/60 , G06V20/46 , G06V40/171 , G10L17/00 , G06T2207/10016 , G06T2207/20084 , G06T2207/30201 , G06T2210/22
摘要: A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.
-
公开(公告)号:US11423906B2
公开(公告)日:2022-08-23
申请号:US16926138
申请日:2020-07-10
申请人: TENCENT AMERICA LLC
发明人: Yong Xu , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu
IPC分类号: G10L15/25
摘要: A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.
-
公开(公告)号:US20220013123A1
公开(公告)日:2022-01-13
申请号:US16926138
申请日:2020-07-10
申请人: TENCENT AMERICA LLC
发明人: Yong XU , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu
IPC分类号: G10L15/25
摘要: A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.
-
-
-
-
-
-