专利检索 ap:("TENCENT AMERICA LLC") AND inv:"Heng Lu" 第 1 页

1.

发明授权
Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis 有权

公开(公告)号：US11011154B2

公开(公告)日：2021-05-18

申请号：US16271154

申请日：2019-02-08

申请人： TENCENT AMERICA LLC

发明人： Shan Yang , Heng Lu , Shiyin Kang , Dong Yu

IPC分类号： G10L13/047 , G06N3/04 , G06N3/08 , G10L13/07

摘要： A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

2.

发明授权
Learning singing from speech 有权

公开(公告)号：US11430431B2

公开(公告)日：2022-08-30

申请号：US16783807

申请日：2020-02-06

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L15/16 , G10L15/02 , G10L25/18 , G06N3/04

摘要： A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

3.

发明授权
Unsupervised singing voice conversion with pitch adversarial network 有权

公开(公告)号：US11257480B2

公开(公告)日：2022-02-22

申请号：US16807851

申请日：2020-03-03

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L25/48 , G10L25/30 , G10L13/00 , G10L13/033 , G10L25/90 , G10L13/047

摘要： A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.

4.

发明授权
Singing voice conversion 有权

公开(公告)号：US11721318B2

公开(公告)日：2023-08-08

申请号：US17501182

申请日：2021-10-14

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L13/027 , G10L13/07 , G10L13/047 , G10L13/00

CPC分类号： G10L13/027 , G10L13/00 , G10L13/047 , G10L13/07

摘要： A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

5.

发明授权
Duration informed attention network (DURIAN) for audio-visual synthesis 有权

公开(公告)号：US11151979B2

公开(公告)日：2021-10-19

申请号：US16549068

申请日：2019-08-23

申请人： TENCENT AMERICA LLC

发明人： Heng Lu , Chengzhu Yu , Dong Yu

IPC分类号： G10L13/08 , G10L13/027 , G10L13/02 , G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

6.

发明授权
Duration informed attention network for text-to-speech analysis 有权

公开(公告)号：US11468879B2

公开(公告)日：2022-10-11

申请号：US16397349

申请日：2019-04-29

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Dong Yu

IPC分类号： G10L13/08 , G10L13/047 , G10L13/00

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.

7.

发明授权
Singing voice conversion 有权

公开(公告)号：US11183168B2

公开(公告)日：2021-11-23

申请号：US16789674

申请日：2020-02-13

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L13/027 , G10L13/07 , G10L13/047 , G10L13/00

摘要： A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

8.

发明申请
UNSUPERVISED SINGING VOICE CONVERSION WITH PITCH ADVERSARIAL NETWORK 有权

公开(公告)号：US20210280165A1

公开(公告)日：2021-09-09

申请号：US16807851

申请日：2020-03-03

申请人： TENCENT AMERICA LLC

发明人： Chengzhu YU , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L13/033 , G10L13/047 , G10L25/90

摘要： A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.

9.

发明授权
Duration informed attention network (DURIAN) for audio-visual synthesis 有权

公开(公告)号：US11670283B2

公开(公告)日：2023-06-06

申请号：US17396182

申请日：2021-08-06

申请人： TENCENT AMERICA LLC

发明人： Heng Lu , Chengzhu Yu , Dong Yu

IPC分类号： G10L13/08 , G10L13/027 , G10L13/02 , G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00

CPC分类号： G10L13/033 , G06T13/40 , G10L13/00 , G10L13/10 , G10L19/0018 , G10L19/03 , G10L2013/105

摘要： A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.

10.

发明申请
LEARNING SINGING FROM SPEECH 有权

公开(公告)号：US20220343904A1

公开(公告)日：2022-10-27

申请号：US17861716

申请日：2022-07-11

申请人： TENCENT AMERICA LLC

发明人： Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu

IPC分类号： G10L15/16 , G10L15/02 , G06N3/04 , G10L25/18

摘要： A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类