METHOD AND SYSTEM FOR GENERATING 2D ANIMATED LIP IMAGES SYNCHRONIZING TO AN AUDIO SIGNAL

    公开(公告)号:US20220058850A1

    公开(公告)日:2022-02-24

    申请号:US17405765

    申请日:2021-08-18

    Abstract: This disclosure relates generally to a method and system for generating 2D animated lip images synchronizing to an audio signal for an unseen subject. Recent advances in Convolutional Neural Network (CNN) based approaches generate convincing talking heads. Personalization of such talking heads requires training the model with large number of samples of the target person which is time consuming. The lip generator system receives an audio signal and a target lip image of an unseen target subject as inputs from a user and processes these inputs to extract a plurality of high dimensional audio image features. The lip generator system is meta-trained with training dataset which consists of large variety of subjects' ethnicity and vocabulary. The meta-trained model generates realistic animation for previously unseen face and unseen audio when finetuned with only a few-shot samples for a predefined interval of time. Additionally, the method protects intrinsic features of the unseen target subject.

    AUDIO-SPEECH DRIVEN ANIMATED TALKING FACE GENERATION USING A CASCADED GENERATIVE ADVERSARIAL NETWORK

    公开(公告)号:US20220036617A1

    公开(公告)日:2022-02-03

    申请号:US17199149

    申请日:2021-03-11

    Abstract: Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated.

    SYSTEM AND METHOD FOR INTEGRATING OBJECTS IN MONOCULAR SLAM

    公开(公告)号:US20210042996A1

    公开(公告)日:2021-02-11

    申请号:US16918743

    申请日:2020-07-01

    Abstract: The embodiments herein provide a system and method for integrating objects in monocular simultaneous localization and mapping (SLAM). State of art object SLAM approach use two popular threads. In first, instance specific models are assumed to be known a priori. In second, a general model for an object such as ellipsoids and cuboids is used. However, these generic models just give the label of the object category and do not give much information about the object pose in the map. The method and system disclosed provide a SLAM framework on a real monocular sequence wherein joint optimization is performed on object localization and edges using category level shape priors and bundle adjustment. The method provides a better visualization incorporating object representations in the scene along with the 3D structure of the base SLAM system, which makes it useful for augmented reality (AR) applications.

Patent Agency Ranking