-
公开(公告)号:US11996084B2
公开(公告)日:2024-05-28
申请号:US17738186
申请日:2022-05-06
Inventor: Liqiang Zhang , Jiankang Hou , Tao Sun , Lei Jia
IPC: G10L13/02 , G06F40/20 , G10L13/04 , G10L13/047 , G10L13/10
CPC classification number: G10L13/10 , G06F40/20 , G10L13/047
Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.
-
公开(公告)号:US11984134B2
公开(公告)日:2024-05-14
申请号:US18071187
申请日:2022-11-29
Inventor: Jiankang Hou , Zhipeng Nie , Liqiang Zhang , Tao Sun , Lei Jia
Abstract: A method of processing audio data, an electronic device, and a storage medium, which relates to a field of artificial intelligence, in particular to a field of speech processing technology. The method includes: processing spectral data of the audio data to obtain a first feature information; obtaining a fundamental frequency indication information according to the first feature information, wherein the fundamental frequency indication information indicates valid audio data of the first feature information and invalid audio data of the first feature information; obtaining a fundamental frequency information and a spectral energy information according to the first feature information and the fundamental frequency indication information; and obtaining a harmonic structure information of the audio data according to the fundamental frequency information and the spectral energy information.
-
公开(公告)号:US20230015112A1
公开(公告)日:2023-01-19
申请号:US17933152
申请日:2022-09-19
Inventor: Jiankang Hou , Tao Sun , Zhipeng Nie , Liqiang Zhang , Lei Jia , Haifeng Wang
IPC: G10L21/10 , G10L13/02 , G10L21/0208 , G10L25/51
Abstract: A method for processing a speech includes: acquiring an original speech; extracting a spectrogram from the original speech; acquiring a speech synthesis model, where the speech synthesis model comprises a first generation sub-model and a second generation sub-model; generating a harmonic structure of the spectrogram, by invoking the first generation sub-model to process the spectrogram; and generating a target speech, by invoking the second generation sub-model to process the harmonic structure and the spectrogram.
-
-