专利检索 ap:("TENCENT America LLC") AND inv:"Dong Yu" 第 2 页

11.

发明授权
Duration informed attention network (DURIAN) for audio-visual synthesis 有权

公开(公告)号：US11670283B2

公开(公告)日：2023-06-06

申请号：US17396182

申请日：2021-08-06

申请人： TENCENT AMERICA LLC

发明人： Heng Lu , Chengzhu Yu , Dong Yu

IPC分类号： G10L13/08 , G10L13/027 , G10L13/02 , G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00

CPC分类号： G10L13/033 , G06T13/40 , G10L13/00 , G10L13/10 , G10L19/0018 , G10L19/03 , G10L2013/105

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

12.

发明申请
LEARNING SINGING FROM SPEECH 有权

公开(公告)号：US20220343904A1

公开(公告)日：2022-10-27

申请号：US17861716

申请日：2022-07-11

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L15/16 , G10L15/02 , G06N3/04 , G10L25/18

摘要： A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

13.

发明授权
All deep learning minimum variance distortionless response beamformer for speech separation and enhancement 有权

公开(公告)号：US11380307B2

公开(公告)日：2022-07-05

申请号：US17038498

申请日：2020-09-30

申请人： TENCENT AMERICA LLC

发明人： Yong Xu , Meng Yu , Shi-Xiong Zhang , Dong Yu

IPC分类号： G10L15/16 , G10L25/21

摘要： A method, computer program, and computer system is provided for automated speech recognition. Audio data corresponding to one or more speakers is received. Covariance matrices of target speech and noise associated with the received audio data are estimated based on a gated recurrent unit-based network. A predicted target waveform corresponding to a target speaker from among the one or more speakers is generated by a minimum variance distortionless response function based on the estimated covariance matrices.

14.

发明授权
Learnable speed control for speech synthesis 有权

公开(公告)号：US11302301B2

公开(公告)日：2022-04-12

申请号：US16807801

申请日：2020-03-03

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Dong Yu

IPC分类号： G10L13/07 , G10L13/06 , G10L13/10 , G10L13/033 , G10L25/18 , G10L13/047 , G10L25/24

摘要： A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.

15.

发明授权
Multi-band synchronized neural vocoder 有权

公开(公告)号：US11295751B2

公开(公告)日：2022-04-05

申请号：US16576943

申请日：2019-09-20

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Meng Yu , Heng Lu , Dong Yu

IPC分类号： G10L19/00 , G10L19/16 , G06N3/02

摘要： An apparatus and a method include receiving an input audio signal to be processed by a multi-band synchronized neural vocoder. The input audio signal is separated into a plurality of frequency bands. A plurality of audio signals corresponding to the plurality of frequency bands is obtained. Each of the audio signals is downsampled, and processed by the multi-band synchronized neural vocoder. An audio output signal is generated.

16.

发明申请
MULTI-MODAL FRAMEWORK FOR MULTI-CHANNEL TARGET SPEECH SEPERATION 有权

公开(公告)号：US20210390970A1

公开(公告)日：2021-12-16

申请号：US16901487

申请日：2020-06-15

申请人： TENCENT AMERICA LLC

发明人： Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu

IPC分类号： G10L21/0272 , G10L17/00 , G06K9/00 , G06T11/60 , G06T7/00 , G06T7/20 , G06N3/02

摘要： A method, computer program, and computer system for separating a target voice from among a plurality of speakers is provided. Video data associated with the plurality of speakers and audio data associated with each of the one or more speakers are received. Video feature data is extracted from the received video data. The target voice is identified from among the plurality of speakers based on the received audio data and the extracted video feature data.

17.

发明申请
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS 有权

公开(公告)号：US20210375259A1

公开(公告)日：2021-12-02

申请号：US17396182

申请日：2021-08-06

申请人： TENCENT AMERICA LLC

发明人： Heng LU , Chengzhu Yu , Dong Yu

IPC分类号： G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

18.

发明授权
Unsupervised automatic speech recognition 有权

公开(公告)号：US11138966B2

公开(公告)日：2021-10-05

申请号：US16269951

申请日：2019-02-07

申请人： TENCENT AMERICA LLC

发明人： Jianshu Chen , Chengzhu Yu , Dong Yu , Chih-Kuan Yeh

IPC分类号： G10L15/00 , G10L15/06 , G10L15/02 , G10L15/22 , G10L15/16 , G10L15/30 , G06N3/04 , G06N3/08 , G06F40/20 , G10L15/187

摘要： A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.

19.

发明申请
UNSUPERVISED AUTOMATIC SPEECH RECOGNITION 审中-公开

公开(公告)号：US20200258497A1

公开(公告)日：2020-08-13

申请号：US16269951

申请日：2019-02-07

申请人： TENCENT AMERICA LLC

发明人： Jianshu Chen , Chengzhu Yu , Dong Yu , Chih-Kuan Yeh

IPC分类号： G10L15/06 , G10L15/02 , G10L15/22 , G10L15/16 , G06F17/27 , G10L15/30 , G06N3/04 , G06N3/08

摘要： A method for generating an automatic speech recognition (ASR) model using unsupervised learning includes obtaining, by a device, text information. The method includes determining, by the device, a set of phoneme sequences associated with the text information. The method includes obtaining, by the device, speech waveform data. The method includes determining, by the device, a set of phoneme boundaries associated with the speech waveform data. The method includes generating, by the device, the ASR model using an output distribution matching (ODM) technique based on determining the set of phoneme sequences associated with the text information and based on determining the set of phoneme boundaries associated with the speech waveform data.

20.

发明公开
NEURALECHO: A SELF-ATTENTIVE RECURRENT NEURAL NETWORK FOR UNIFIED ACOUSTIC ECHO SUPPRESSION, SPEAKER AWARE SPEECH ENHANCEMENT AND AUTOMATIC GAIN CONTROL 审中-公开

公开(公告)号：US20240085935A1

公开(公告)日：2024-03-14

申请号：US18513175

申请日：2023-11-17

申请人： TENCENT AMERICA LLC

发明人： Meng YU , Yong Xu , Chunlei Zhang , Shi-xiong Zhang , Dong Yu

IPC分类号： G05F1/66 , G05B15/02 , G06N5/04 , G06Q50/06 , G06Q90/00

CPC分类号： G05F1/66 , G05B15/02 , G06N5/04 , G06Q50/06 , G06Q90/00

摘要： A method of acoustic echo suppression using a recurrent neural network, performed by at least one processor, is provided. The method includes receiving a microphone signal and a far-end reference signal, estimating an echo suppressed signal and an echo signal based on the microphone signal and the far-end reference signal, estimating enhancement filters for the microphone signal based on the echo suppressed signal and the echo signal, generating an enhanced signal based on the enhancement filters, and adjusting the enhanced signal using automatic gain control (AGC) and outputting the adjusted signal.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类