- 专利标题: TEXT-DRIVEN VIDEO SYNTHESIS WITH PHONETIC DICTIONARY
-
申请号: US17221701申请日: 2021-04-02
-
公开(公告)号: US20210390945A1公开(公告)日: 2021-12-16
- 发明人: Sibo ZHANG , Jiahong YUAN , Miao LIAO , Liangjun ZHANG
- 申请人: Baidu USA, LLC
- 申请人地址: US CA Sunnyvale
- 专利权人: Baidu USA, LLC
- 当前专利权人: Baidu USA, LLC
- 当前专利权人地址: US CA Sunnyvale
- 主分类号: G10L13/08
- IPC分类号: G10L13/08 ; G10L13/027 ; G06N3/04 ; G06N3/08 ; G06F40/242 ; G10L15/187 ; G06F16/783 ; G06F16/78
摘要:
Presented herein are novel approaches to synthesize video of the speech from text. In a training phase, embodiments build a phoneme-pose dictionary and train a generative neural network model using a generative adversarial network (GAN) to generate video from interpolated phoneme poses. In deployment, the trained generative neural network in conjunction with the phoneme-pose dictionary convert an input text into a video of a person speaking the words of the input text. Compared to audio-driven video generation approaches, the embodiments herein have a number of advantages: 1) they only need a fraction of the training data used by an audio-driven approach; 2) they are more flexible and not subject to vulnerability due to speaker variation; and 3) they significantly reduce the preprocessing, training, and inference times.
公开/授权文献
- US11587548B2 Text-driven video synthesis with phonetic dictionary 公开/授权日:2023-02-21
信息查询