Audio-speech driven animated talking face generation using a cascaded generative adversarial network

Invention Grant

US11551394B2 Audio-speech driven animated talking face generation using a cascaded generative adversarial network 有权

Please log in to see more content

Patent Title: Audio-speech driven animated talking face generation using a cascaded generative adversarial network
Application No.: US17199149

Application Date: 2021-03-11
Publication No.: US11551394B2

Publication Date: 2023-01-10
Inventor: Sandika Biswas , Dipanjan Das , Sanjana Sinha , Brojeshwar Bhowmick
Applicant: Tata Consultancy Services Limited
Applicant Address: IN Mumbai
Assignee: Tata Consultancy Services Limited
Current Assignee: Tata Consultancy Services Limited
Current Assignee Address: IN Mumbai
Agency: Finnegan, Henderson, Farabow, Garrett & Dunner LLP
Priority: IN202021032794 20200730
Main IPC: G06T13/20
IPC: G06T13/20 ; G06V40/16 ; G06K9/62 ; G06N3/04 ; G06N3/08 ; G10L15/02

Audio-speech driven animated talking face generation using a cascaded generative adversarial network

Abstract:

Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated.

Public/Granted literature

US20220036617A1 AUDIO-SPEECH DRIVEN ANIMATED TALKING FACE GENERATION USING A CASCADED GENERATIVE ADVERSARIAL NETWORK Public/Granted day:2022-02-03

Information query

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06T	一般的图像数据处理或产生
G06T13/00	动画制作
G06T13/20	.3D〔三维〕动画