专利检索 ap:("TENCENT AMERICA LLC") AND inv:"Dong Yu" 第 3 页

21.

发明公开
TECHNIQUES FOR UNIFIED ACOUSTIC ECHO SUPPRESSION USING A RECURRENT NEURAL NETWORK 审中-公开

公开(公告)号：US20230403505A1

公开(公告)日：2023-12-14

申请号：US17840188

申请日：2022-06-14

申请人： Tencent America LLC

发明人： Meng Yu , Yong Xu , Chunlei Zhang , Shi-xiong Zhang , Dong Yu

IPC分类号： H04R3/02

CPC分类号： H04R3/02

摘要： A method of acoustic echo suppression using a recurrent neural network, performed by at least one processor, is provided. The method includes receiving a microphone signal and a far-end reference signal, estimating an echo suppressed signal and an echo signal based on the microphone signal and the far-end reference signal, estimating enhancement filters for the microphone signal based on the echo suppressed signal and the echo signal, generating an enhanced signal based on the enhancement filters, and adjusting the enhanced signal using automatic gain control (AGC) and outputting the adjusted signal.

22.

发明公开
NEURAL-ECHO: AN UNIFIED DEEP NEURAL NETWORK MODEL FOR ACOUSTIC ECHO CANCELLATION AND RESIDUAL ECHO SUPPRESSION 审中-公开

公开(公告)号：US20230395091A1

公开(公告)日：2023-12-07

申请号：US18452992

申请日：2023-08-21

申请人： TENCENT AMERICA LLC

发明人： Meng YU , Dong Yu

IPC分类号： G10L21/0224 , H04R3/04 , G06N3/02

CPC分类号： G10L21/0224 , H04R3/04 , G06N3/02 , G10L2021/02163

摘要： A method, computer program, and computer system is provided for an all-deep-learning based AEC system by recurrent neural networks. The model consists of two stages, echo estimation stage and echo suppression stage, respectively. Two different schemes for echo estimation are presented herein: linear echo estimation by multi-tap filtering on far-end reference signal and non-linear echo estimation by single-tap masking on microphone signal. A microphone signal waveform and a far-end reference signal waveform are received. An echo signal waveform is estimated based on the microphone signal waveform and a far-end reference signal waveform. A near-end speech signal waveform is output based on subtracting the estimated echo signal waveform from the microphone signal waveform, and echoes are suppressed within the near-end speech signal waveform.

23.

发明授权
Multi-modal framework for multi-channel target speech separation 有权

公开(公告)号：US11688412B2

公开(公告)日：2023-06-27

申请号：US16901487

申请日：2020-06-15

申请人： TENCENT AMERICA LLC

发明人： Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu

IPC分类号： G06V40/16 , G10L21/0272 , G10L17/00 , G06T11/60 , G06T7/20 , G06N3/02 , G06T7/00 , G06V20/40

CPC分类号： G10L21/0272 , G06N3/02 , G06T7/0012 , G06T7/20 , G06T11/60 , G06V20/46 , G06V40/171 , G10L17/00 , G06T2207/10016 , G06T2207/20084 , G06T2207/30201 , G06T2210/22

摘要： A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.

24.

发明授权
Token-wise training for attention based end-to-end speech recognition 有权

公开(公告)号：US11636848B2

公开(公告)日：2023-04-25

申请号：US17316856

申请日：2021-05-11

申请人： TENCENT AMERICA LLC

发明人： Peidong Wang , Jia Cui , Chao Weng , Dong Yu

IPC分类号： G10L15/06 , G10L15/22 , G10L15/14 , G06N20/00 , G06N7/01

摘要： A method of attention-based end-to-end (A-E2E) automatic speech recognition (ASR) training, includes performing cross-entropy training of a model, based on one or more input features of a speech signal, determining a posterior probability vector at a time of a first wrong token among one or more output tokens of the model of which the cross-entropy training is performed, and determining a loss of the first wrong token at the time, based on the determined posterior probability vector. The method further includes determining a total loss of a training set of the model of which the cross-entropy training is performed, based on the determined loss of the first wrong token, and updating the model of which the cross-entropy training is performed, based on the determined total loss of the training set.

25.

发明授权
N-best softmax smoothing for minimum bayes risk training of attention based sequence-to-sequence models 有权

公开(公告)号：US11551136B2

公开(公告)日：2023-01-10

申请号：US16191027

申请日：2018-11-14

申请人： TENCENT America LLC

发明人： Chao Weng , Jia Cui , Guangsen Wang , Jun Wang , Chengzhu Yu , Dan Su , Dong Yu

IPC分类号： G06N20/00 , G06N3/04 , G06N3/08 , G06F40/47 , G06V10/70 , G06K9/62 , G10L15/06

摘要： A method and apparatus are provided that analyzing sequence-to-sequence data, such as sequence-to-sequence speech data or sequence-to-sequence machine translation data for example, by minimum Bayes risk (MBR) training a sequence-to-sequence model and within introduction of applications of softmax smoothing to an N-best generation of the MBR training of the sequence-to-sequence model.

26.

发明授权
Duration informed attention network for text-to-speech analysis 有权

公开(公告)号：US11468879B2

公开(公告)日：2022-10-11

申请号：US16397349

申请日：2019-04-29

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Dong Yu

IPC分类号： G10L13/08 , G10L13/047 , G10L13/00

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.

27.

发明授权
Multi-tap minimum variance distortionless response beamformer with neural networks for target speech separation 有权

公开(公告)号：US11423906B2

公开(公告)日：2022-08-23

申请号：US16926138

申请日：2020-07-10

申请人： TENCENT AMERICA LLC

发明人： Yong Xu , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu

IPC分类号： G10L15/25

摘要： A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.

28.

发明申请
LEARNABLE SPEED CONTROL OF SPEECH SYNTHESIS 有权

公开(公告)号：US20220180856A1

公开(公告)日：2022-06-09

申请号：US17679790

申请日：2022-02-24

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Dong Yu

IPC分类号： G10L13/033 , G10L25/18 , G10L13/06 , G10L13/047 , G10L25/24

摘要： A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

29.

发明申请
MULTI-TAP MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER WITH NEURAL NETWORKS FOR TARGET SPEECH SEPARATION 有权

公开(公告)号：US20220013123A1

公开(公告)日：2022-01-13

申请号：US16926138

申请日：2020-07-10

申请人： TENCENT AMERICA LLC

发明人： Yong XU , Meng Yu , Shi-Xiong Zhang , Chao Weng , Jianming Liu , Dong Yu

IPC分类号： G10L15/25

摘要： A method, computer system, and computer readable medium are provided for automatic speech recognition. Video data and audio data corresponding to one or more speakers is received. A minimum variance distortionless response function is applied to the received audio and video data. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated based on back-propagating the output of the applied minimum variance distortionless response function.

30.

发明授权
Singing voice conversion 有权

公开(公告)号：US11183168B2

公开(公告)日：2021-11-23

申请号：US16789674

申请日：2020-02-13

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L13/027 , G10L13/07 , G10L13/047 , G10L13/00

摘要： A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类