TEXT-DRIVEN VIDEO SYNTHESIS WITH PHONETIC DICTIONARY

发明申请

US20210390945A1 TEXT-DRIVEN VIDEO SYNTHESIS WITH PHONETIC DICTIONARY 有权

请登陆查看更多内容

专利标题： TEXT-DRIVEN VIDEO SYNTHESIS WITH PHONETIC DICTIONARY
申请号： US17221701

申请日： 2021-04-02
公开(公告)号： US20210390945A1

公开(公告)日： 2021-12-16
发明人: Sibo ZHANG , Jiahong YUAN , Miao LIAO , Liangjun ZHANG
申请人： Baidu USA, LLC
申请人地址： US CA Sunnyvale
专利权人： Baidu USA, LLC
当前专利权人： Baidu USA, LLC
当前专利权人地址： US CA Sunnyvale
主分类号： G10L13/08
IPC分类号： G10L13/08 ; G10L13/027 ; G06N3/04 ; G06N3/08 ; G06F40/242 ; G10L15/187 ; G06F16/783 ; G06F16/78

TEXT-DRIVEN VIDEO SYNTHESIS WITH PHONETIC DICTIONARY

摘要：

Presented herein are novel approaches to synthesize video of the speech from text. In a training phase, embodiments build a phoneme-pose dictionary and train a generative neural network model using a generative adversarial network (GAN) to generate video from interpolated phoneme poses. In deployment, the trained generative neural network in conjunction with the phoneme-pose dictionary convert an input text into a video of a person speaking the words of the input text. Compared to audio-driven video generation approaches, the embodiments herein have a number of advantages: 1) they only need a fraction of the training data used by an audio-driven approach; 2) they are more flexible and not subject to vulnerability due to speaker variation; and 3) they significantly reduce the preprocessing, training, and inference times.

公开/授权文献

US11587548B2 Text-driven video synthesis with phonetic dictionary 公开/授权日：2023-02-21

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G10	乐器；声学
G10L	语音分析或合成；语音识别；语音或声音处理；语音或音频编码或解码
G10L13/00	语音合成；文本-语音合成系统
G10L13/08	.文本分析或文本以外的语音合成参数的产生，例如语义图翻译为音素、韵律产生、重音或声调测定