专利检索 ap:("TENCENT AMERICA LLC") AND inv:"Meng Yu" 第 1 页

1.

发明授权
ADL-UFE: all deep learning unified front-end system 有权

公开(公告)号：US12094481B2

公开(公告)日：2024-09-17

申请号：US17455497

申请日：2021-11-18

申请人： TENCENT AMERICA LLC

发明人： Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu

IPC分类号： G10L21/0208 , G06N3/044 , G06N3/08 , G10L21/0216 , G10L21/0264 , G10L25/30

CPC分类号： G10L21/0208

摘要： There is included a method and apparatus comprising computer code for generating enhanced target speech from audio data, performed by a computing device, the method comprising: receiving audio data corresponding to one or more speakers; generating estimated an target speech, an estimated noise, and an estimated echo simultaneously based on the audio data using a jointly trained complex ratio mask; predicting frame-level multi-tap time-frequency (T-F) spatio-temporal-echo filter weights based on the estimated target speech, the estimated noise, and the estimated echo using a trained neural network model; and predicting enhanced target speech based on the frame-level multi-tap T-F spatio-temporal-echo filter weights.

2.

发明授权
Unified deep neural network model for acoustic echo cancellation and residual echo suppression 有权

公开(公告)号：US11776556B2

公开(公告)日：2023-10-03

申请号：US17485943

申请日：2021-09-27

申请人： TENCENT AMERICA LLC

发明人： Meng Yu , Dong Yu

IPC分类号： G10L21/0224 , H04R3/04 , G06N3/02 , G10L21/0216 , G10L21/0208

CPC分类号： G10L21/0224 , G06N3/02 , H04R3/04 , G10L2021/02082 , G10L2021/02163

摘要： A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

3.

发明授权
All deep learning minimum variance distortionless response beamformer for speech separation and enhancement 有权

公开(公告)号：US11380307B2

公开(公告)日：2022-07-05

申请号：US17038498

申请日：2020-09-30

申请人： TENCENT AMERICA LLC

发明人： Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu

IPC分类号： G10L15/16 , G10L25/21

摘要： A method, computer program, and computer system is provided for automated speech recognition. Audio data corresponding to one or more speakers is received. Covariance matrices of target speech and noise associated with the received audio data are estimated based on a gated recurrent unit-based network. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated by a minimum variance distortionless response function based on the estimated covariance matrices.

4.

发明授权
Multi-band synchronized neural vocoder 有权

公开(公告)号：US11295751B2

公开(公告)日：2022-04-05

申请号：US16576943

申请日：2019-09-20

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Meng Yu , Heng Lu , Dong Yu

IPC分类号： G10L19/00 , G10L19/16 , G06N3/02

摘要： An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

5.

发明申请
MULTI-MODAL FRAMEWORK FOR MULTI-CHANNEL TARGET SPEECH SEPERATION 有权

公开(公告)号：US20210390970A1

公开(公告)日：2021-12-16

申请号：US16901487

申请日：2020-06-15

申请人： TENCENT AMERICA LLC

发明人： Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu

IPC分类号： G10L21/0272 , G10L17/00 , G06K9/00 , G06T11/60 , G06T7/00 , G06T7/20 , G06N3/02

摘要： A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.

6.

发明授权
Multi-look enhancement modeling and application for keyword spotting 有权

公开(公告)号：US11410652B2

公开(公告)日：2022-08-09

申请号：US16921161

申请日：2020-07-06

申请人： TENCENT AMERICA LLC

发明人： Meng Yu , Dong Yu

IPC分类号： G10L15/00 , G10L15/22 , G10L15/02 , G10L15/08

摘要： A method, computer system, and computer readable medium are provided for activating speech recognition based on keyword spotting (KWS). Waveform data corresponding to one or more speakers is received. One or more direction features are extracted from the received waveform data. One or more keywords are determined from the received waveform data based on the one or more extracted features. Speech recognition is activated based on detecting the determined keyword.

7.

发明公开
TECHNIQUES FOR UNIFIED ACOUSTIC ECHO SUPPRESSION USING A RECURRENT NEURAL NETWORK 审中-公开

公开(公告)号：US20230403505A1

公开(公告)日：2023-12-14

申请号：US17840188

申请日：2022-06-14

申请人： Tencent America LLC

发明人： Meng Yu , Yong Xu , Chunlei Zhang , Shi-xiong Zhang , Dong Yu

IPC分类号： H04R3/02

CPC分类号： H04R3/02

摘要： A method of acoustic echo suppression using a recurrent neural network, performed by at least one processor, is provided. The method includes receiving a microphone signal and a far-end reference signal, estimating an echo suppressed signal and an echo signal based on the microphone signal and the far-end reference signal, estimating enhancement filters for the microphone signal based on the echo suppressed signal and the echo signal, generating an enhanced signal based on the enhancement filters, and adjusting the enhanced signal using automatic gain control (AGC) and outputting the adjusted signal.

8.

发明授权
Multi-modal framework for multi-channel target speech separation 有权

公开(公告)号：US11688412B2

公开(公告)日：2023-06-27

申请号：US16901487

申请日：2020-06-15

申请人： TENCENT AMERICA LLC

发明人： Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu

IPC分类号： G06V40/16 , G10L21/0272 , G10L17/00 , G06T11/60 , G06T7/20 , G06N3/02 , G06T7/00 , G06V20/40

CPC分类号： G10L21/0272 , G06N3/02 , G06T7/0012 , G06T7/20 , G06T11/60 , G06V20/46 , G06V40/171 , G10L17/00 , G06T2207/10016 , G06T2207/20084 , G06T2207/30201 , G06T2210/22

摘要： A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.

9.

发明授权
Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation 有权

公开(公告)号：US11423906B2

公开(公告)日：2022-08-23

申请号：US16926138

申请日：2020-07-10

申请人： TENCENT AMERICA LLC

发明人： Yong Xu , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu

IPC分类号： G10L15/25

摘要： A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.

10.

发明申请
MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION 有权

公开(公告)号：US20220013123A1

公开(公告)日：2022-01-13

申请号：US16926138

申请日：2020-07-10

申请人： TENCENT AMERICA LLC

发明人： Yong XU , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu

IPC分类号： G10L15/25

摘要： A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类