Patent search ap:("INSTITUTE OF AUTOMATION Page CHINESE ACADEMY OF SCIENCES") AND inv:"Ruibo Fu"

1.

发明授权
Method and apparatus for editing audio, electronic device and storage medium 有权

公开(公告)号：US11462207B1

公开(公告)日：2022-10-04

申请号：US17737666

申请日：2022-05-05

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Tao Wang , Jiangyan Yi , Ruibo Fu

IPC: G10L13/08 , G10L13/033 , G10L13/047 , G06F40/166 , G06N3/08 , G10L25/03

Abstract: Disclosed are a method and an apparatus for editing audio, an electronic device and a storage medium. The method includes: acquiring a modified text obtained by modifying a known original text of an audio to be edited according to a known text for modification; predicting a duration of an audio corresponding to the text for modification; adjusting a region to be edited of the audio to be edited according to the duration of the audio corresponding to the text for modification, to obtain an adjusted audio to be edited; obtaining, based on a pre-trained audio editing model, an edited audio according to the adjusted audio to be edited and the modified text. In the present disclosure, the edited audio obtained by the audio editing model sounds natural in the context, and supports the function of synthesizing new words that do not appear in the corpus.

2.

发明授权
Method for detecting voice splicing points and storage medium 有权

公开(公告)号：US11410685B1

公开(公告)日：2022-08-09

申请号：US17668074

申请日：2022-02-09

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventor： Jianhua Tao , Ruibo Fu , Jiangyan Yi

IPC: G10L25/87 , G06N3/04 , G10L25/30 , G10L25/24

Abstract: Disclosed are a method for detecting speech concatenating points and a storage medium. The method includes: acquiring a speech to be detected, and determining high-frequency components and low-frequency components of the speech to be detected; extracting first cepstrum features and second cepstrum features corresponding to the speech to be detected according to the high-frequency components and the low-frequency components; splicing the first and the second cepstrum feature of speech per frame in the speech to be detected in units of frame so as to obtain a parameter sequence; inputting the parameter sequence into a neural network model so as to obtain a feature sequence corresponding to the speech to be detected, wherein the model has been trained, has learned and stored a correspondence between the parameter sequence and the feature sequence; and performing detection of speech concatenating points on the speech to be detected according to the feature sequence.

Patent Agency Ranking