-
公开(公告)号:US11462207B1
公开(公告)日:2022-10-04
申请号:US17737666
申请日:2022-05-05
Inventor: Jianhua Tao , Tao Wang , Jiangyan Yi , Ruibo Fu
IPC: G10L13/08 , G10L13/033 , G10L13/047 , G06F40/166 , G06N3/08 , G10L25/03
Abstract: Disclosed are a method and an apparatus for editing audio, an electronic device and a storage medium. The method includes: acquiring a modified text obtained by modifying a known original text of an audio to be edited according to a known text for modification; predicting a duration of an audio corresponding to the text for modification; adjusting a region to be edited of the audio to be edited according to the duration of the audio corresponding to the text for modification, to obtain an adjusted audio to be edited; obtaining, based on a pre-trained audio editing model, an edited audio according to the adjusted audio to be edited and the modified text. In the present disclosure, the edited audio obtained by the audio editing model sounds natural in the context, and supports the function of synthesizing new words that do not appear in the corpus.
-
公开(公告)号:US11410685B1
公开(公告)日:2022-08-09
申请号:US17668074
申请日:2022-02-09
Inventor: Jianhua Tao , Ruibo Fu , Jiangyan Yi
Abstract: Disclosed are a method for detecting speech concatenating points and a storage medium. The method includes: acquiring a speech to be detected, and determining high-frequency components and low-frequency components of the speech to be detected; extracting first cepstrum features and second cepstrum features corresponding to the speech to be detected according to the high-frequency components and the low-frequency components; splicing the first and the second cepstrum feature of speech per frame in the speech to be detected in units of frame so as to obtain a parameter sequence; inputting the parameter sequence into a neural network model so as to obtain a feature sequence corresponding to the speech to be detected, wherein the model has been trained, has learned and stored a correspondence between the parameter sequence and the feature sequence; and performing detection of speech concatenating points on the speech to be detected according to the feature sequence.
-