-
公开(公告)号:US20230059882A1
公开(公告)日:2023-02-23
申请号:US17738186
申请日:2022-05-06
Inventor: Liqiang ZHANG , Jiankang HOU , Tao SUN , Lei JIA
IPC: G10L13/10 , G06F40/20 , G10L13/047
Abstract: The present disclosure discloses a speech synthesis method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring to-be-synthesized text; acquiring a prosody feature extracted from the text; inputting the text and the prosody feature into a speech synthesis model to obtain a vocoder feature; and inputting the vocoder feature into a vocoder to obtain synthesized speech.
-
公开(公告)号:US20220020356A1
公开(公告)日:2022-01-20
申请号:US17489616
申请日:2021-09-29
Inventor: Wenfu WANG , Tao SUN , Xilei WANG , Junteng ZHANG , Zhengkun GAO , Lei JIA
Abstract: The present disclosure provides a method and apparatus of synthesizing a speech, a method and apparatus of training a speech synthesis model, an electronic device, and a storage medium. The method of synthesizing a speech includes acquiring a style information of a speech to be synthesized, a tone information of the speech to be synthesized, and a content information of a text to be processed; generating an acoustic feature information of the text to be processed, by using a pre-trained speech synthesis model, based on the style information, the tone information, and the content information of the text to be processed; and synthesizing the speech for the text to be processed, based on the acoustic feature information of the text to be processed.
-
公开(公告)号:US20220076657A1
公开(公告)日:2022-03-10
申请号:US17455156
申请日:2021-11-16
Inventor: Wenfu WANG , Xilei WANG , Tao SUN , Han YUAN , Zhengkun GAO , Lei JIA
IPC: G10L13/02
Abstract: A method of registering an attribute in a speech synthesis model, an apparatus of registering an attribute in a speech synthesis model, an electronic device, and a medium are provided, which relate to a field of an artificial intelligence technology such as a deep learning and intelligent speech technology. The method includes: acquiring a plurality of data associated with an attribute to be registered; and registering the attribute in the speech synthesis model by using the plurality of data associated with the attribute, wherein the speech synthesis model is trained in advance by using a training data in a training data set.
-
公开(公告)号:US20230178067A1
公开(公告)日:2023-06-08
申请号:US18074023
申请日:2022-12-02
Inventor: Wenfu WANG , Tao SUN , Xilei WANG , Lei JIA
IPC: G10L13/047 , G10L25/30
CPC classification number: G10L13/047 , G10L25/30
Abstract: A method of training a speech synthesis method, a method of synthesizing a speech, a device and a storage medium are provided, which relate to a field of artificial intelligence technology, in particular to a field of speech synthesis technology. The specific implementation scheme includes: processing training data by using the speech synthesis model, so as to determine a content encoding sequence, a style encoding sequence, a timbre encoding vector, a noise environment vector and a target Mel spectrum sequence corresponding to the training data; determine a total loss value according to the content encoding sequence, the style encoding sequence, the timbre encoding vector, the noise environment vector and the target Mel spectrum sequence; and adjusting a parameter of the speech synthesis model according to the total loss value.
-
公开(公告)号:US20230005466A1
公开(公告)日:2023-01-05
申请号:US17820339
申请日:2022-08-17
Inventor: Zhengkun GAO , Junteng ZHANG , Tao SUN , Lei JIA
IPC: G10L13/08 , G10L13/047
Abstract: The disclosure provides a speech synthesis method, and an electronic device. The technical solution is described as follows. A text to be synthesized and speech features of a target user are obtained. Predicted first acoustic features based on the text to be synthesized and the speech features are obtained. A target template audio is obtained from a template audio library based on the text to be synthesized. Second acoustic features of the target template audio are extracted. Target acoustic features are generated by splicing the first acoustic features and the second acoustic features. Speech synthesis is performed on the text to be synthesized based on the target acoustic features and the speech features, to generate a target speech of the text to be synthesized.
-
公开(公告)号:US20230087531A1
公开(公告)日:2023-03-23
申请号:US18071187
申请日:2022-11-29
Inventor: Jiankang HOU , Zhipeng NIE , Liqiang ZHANG , Tao SUN , Lei JIA
Abstract: A method of processing audio data, an electronic device, and a storage medium, which relates to a field of artificial intelligence, in particular to a field of speech processing technology. The method includes: processing spectral data of the audio data to obtain a first feature information; obtaining a fundamental frequency indication information according to the first feature information, wherein the fundamental frequency indication information indicates valid audio data of the first feature information and invalid audio data of the first feature information; obtaining a fundamental frequency information and a spectral energy information according to the first feature information and the fundamental frequency indication information; and obtaining a harmonic structure information of the audio data according to the fundamental frequency information and the spectral energy information.
-
公开(公告)号:US20230056128A1
公开(公告)日:2023-02-23
申请号:US17736175
申请日:2022-05-04
Inventor: Liqiang ZHANG , Jiankang HOU , Tao SUN , Lei JIA
Abstract: The present disclosure discloses a speech processing method and apparatus, a device and a computer storage medium, and relates to speech and deep learning technologies in the field of artificial intelligence technologies. A specific implementation solution involves: acquiring a vocoder feature obtained for text; correcting a value of an unvoiced and voiced (UV) feature in the vocoder feature according to an energy feature and/or a speech spectrum feature in the vocoder feature; and providing the corrected vocoder feature for a vocoder, so as to obtain synthesized speech.
-
-
-
-
-
-