-
1.
公开(公告)号:US11011154B2
公开(公告)日:2021-05-18
申请号:US16271154
申请日:2019-02-08
申请人: TENCENT AMERICA LLC
发明人: Shan Yang , Heng Lu , Shiyin Kang , Dong Yu
IPC分类号: G10L13/047 , G06N3/04 , G06N3/08 , G10L13/07
摘要: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
-
公开(公告)号:US11430431B2
公开(公告)日:2022-08-30
申请号:US16783807
申请日:2020-02-06
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
摘要: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
-
公开(公告)号:US11257480B2
公开(公告)日:2022-02-22
申请号:US16807851
申请日:2020-03-03
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
IPC分类号: G10L25/48 , G10L25/30 , G10L13/00 , G10L13/033 , G10L25/90 , G10L13/047
摘要: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.
-
公开(公告)号:US11721318B2
公开(公告)日:2023-08-08
申请号:US17501182
申请日:2021-10-14
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
IPC分类号: G10L13/027 , G10L13/07 , G10L13/047 , G10L13/00
CPC分类号: G10L13/027 , G10L13/00 , G10L13/047 , G10L13/07
摘要: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
-
公开(公告)号:US11151979B2
公开(公告)日:2021-10-19
申请号:US16549068
申请日:2019-08-23
申请人: TENCENT AMERICA LLC
发明人: Heng Lu , Chengzhu Yu , Dong Yu
IPC分类号: G10L13/08 , G10L13/027 , G10L13/02 , G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00
摘要: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
-
公开(公告)号:US11468879B2
公开(公告)日:2022-10-11
申请号:US16397349
申请日:2019-04-29
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Dong Yu
IPC分类号: G10L13/08 , G10L13/047 , G10L13/00
摘要: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.
-
公开(公告)号:US11183168B2
公开(公告)日:2021-11-23
申请号:US16789674
申请日:2020-02-13
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
IPC分类号: G10L13/027 , G10L13/07 , G10L13/047 , G10L13/00
摘要: A method, computer program, and computer system is provided for converting a singing first singing voice associated with a first speaker to a second singing voice associated with a second speaker. A context associated with one or more phonemes corresponding to the first singing voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a sample corresponding to the first singing voice is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
-
公开(公告)号:US20210280165A1
公开(公告)日:2021-09-09
申请号:US16807851
申请日:2020-03-03
申请人: TENCENT AMERICA LLC
发明人: Chengzhu YU , Heng Lu , Chao Weng , Dong Yu
IPC分类号: G10L13/033 , G10L13/047 , G10L25/90
摘要: A method, a computer readable medium, and a computer system are provided for singing voice conversion. Data corresponding to a singing voice is received. One or more features and pitch data are extracted from the received data using one or more adversarial neural networks. One or more audio samples are generated based on the extracted pitch data and the one or more features.
-
公开(公告)号:US11670283B2
公开(公告)日:2023-06-06
申请号:US17396182
申请日:2021-08-06
申请人: TENCENT AMERICA LLC
发明人: Heng Lu , Chengzhu Yu , Dong Yu
IPC分类号: G10L13/08 , G10L13/027 , G10L13/02 , G10L13/033 , G10L13/10 , G10L19/03 , G06T13/40 , G10L19/00 , G10L13/00
CPC分类号: G10L13/033 , G06T13/40 , G10L13/00 , G10L13/10 , G10L19/0018 , G10L19/03 , G10L2013/105
摘要: A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
-
公开(公告)号:US20220343904A1
公开(公告)日:2022-10-27
申请号:US17861716
申请日:2022-07-11
申请人: TENCENT AMERICA LLC
发明人: Chengzhu Yu , Heng Lu , Chao Weng , Dong Yu
摘要: A method, computer program, and computer system is provided for converting a singing voice of a first person associated with a first speaker to a singing voice of a second person using a speaking voice of the second person associated with a second speaker. A context associated with one or more phonemes corresponding to the singing voice of a first person is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes, the target acoustic frames, and a sample of the speaking voice of the second person. A sample corresponding to the singing voice of a first person is converted to a sample corresponding to the second singing voice using the generated mel-spectrogram features.
-
-
-
-
-
-
-
-
-