AUDIO-SPEECH DRIVEN ANIMATED TALKING FACE GENERATION USING A CASCADED GENERATIVE ADVERSARIAL NETWORK

    公开(公告)号:US20220036617A1

    公开(公告)日:2022-02-03

    申请号:US17199149

    申请日:2021-03-11

    Abstract: Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated.

    SYSTEM AND METHOD FOR INTEGRATING OBJECTS IN MONOCULAR SLAM

    公开(公告)号:US20210042996A1

    公开(公告)日:2021-02-11

    申请号:US16918743

    申请日:2020-07-01

    Abstract: The embodiments herein provide a system and method for integrating objects in monocular simultaneous localization and mapping (SLAM). State of art object SLAM approach use two popular threads. In first, instance specific models are assumed to be known a priori. In second, a general model for an object such as ellipsoids and cuboids is used. However, these generic models just give the label of the object category and do not give much information about the object pose in the map. The method and system disclosed provide a SLAM framework on a real monocular sequence wherein joint optimization is performed on object localization and edges using category level shape priors and bundle adjustment. The method provides a better visualization incorporating object representations in the scene along with the 3D structure of the base SLAM system, which makes it useful for augmented reality (AR) applications.

    SYSTEM AND METHOD FOR STITCHING IMAGES USING NON-LINEAR OPTIMIZATION AND MULTI-CONSTRAINT COST FUNCTION MINIMIZATION

    公开(公告)号:US20200327642A1

    公开(公告)日:2020-10-15

    申请号:US16830328

    申请日:2020-03-26

    Abstract: The present disclosure provides a system and a method for stitching images using non-linear optimization and multi-constraint cost function minimization. Most of conventional homography based transformation approaches for image alignment, calculate transformations based on linear algorithms which ignore parameters such as lens distortion and unable to handle parallax for non-planar images resulting in improper image stitching with misalignments. The disclosed system and the method generates initial stitched image by estimating a global homography for each image using estimated pairwise homography matrix and feature point correspondences for each pair of images, based on a non-linear optimization. Local warping based image alignment is applied on the initial stitched image, using multi-constraint cost function minimization to mitigate aberrations caused by noises in the global homography estimation to generate the refined stitched image. The refined stitched image is accurate and free from misalignments and poor intensities.

    CONSTRUCTING A 3D STRUCTURE
    15.
    发明申请
    CONSTRUCTING A 3D STRUCTURE 有权
    构造3D结构

    公开(公告)号:US20150371396A1

    公开(公告)日:2015-12-24

    申请号:US14493959

    申请日:2014-09-23

    Abstract: Disclosed is a method and system for constructing a 3D structure. The system of the present disclosure comprises an image capturing unit for capturing images of an object. The system comprises of a gyroscope, a magnetometer, and an accelerometer for determining extrinsic camera parameters, wherein the extrinsic camera parameters comprise a rotation and a translation of the images. Further the system determines an internal calibration matrix once. The system uses the extrinsic camera parameters and the internal calibration matrix for determining a fundamental matrix. The system extracts features of the images for establishing point correspondences between the images. Further, the point correspondences are filtered using the fundamental matrix for generating filtered point correspondences. The filtered point correspondences are triangulated for determining 3D points representing the 3D structure. Further, the 3D structure may be optimized for eliminating reprojection errors associated with the 3D structure.

    Abstract translation: 公开了一种用于构造3D结构的方法和系统。 本公开的系统包括用于捕获对象的图像的图像捕获单元。 该系统包括陀螺仪,磁力计和用于确定外在摄像机参数的加速度计,其中外在摄像机参数包括图像的旋转和平移。 此外,系统确定内部校准矩阵一次。 该系统使用外在摄像机参数和内部校准矩阵来确定基本矩阵。 系统提取图像的特征以建立图像之间的点对应。 此外,使用用于生成滤波点对应的基本矩阵来对点对应进行滤波。 过滤的点对应被三角测量,用于确定表示3D结构的3D点。 此外,可以优化3D结构以消除与3D结构相关联的重新投射错误。

    IDENTITY PRESERVING REALISTIC TALKING FACE GENERATION USING AUDIO SPEECH OF A USER

    公开(公告)号:US20210366173A1

    公开(公告)日:2021-11-25

    申请号:US17036583

    申请日:2020-09-29

    Abstract: Speech-driven facial animation is useful for a variety of applications such as telepresence, chatbots, etc. The necessary attributes of having a realistic face animation are: 1) audiovisual synchronization, (2) identity preservation of the target individual, (3) plausible mouth movements, and (4) presence of natural eye blinks. Existing methods mostly address audio-visual lip synchronization, and synthesis of natural facial gestures for overall video realism. However, existing approaches are not accurate. Present disclosure provides system and method that learn motion of facial landmarks as an intermediate step before generating texture. Person-independent facial landmarks are generated from audio for invariance to different voices, accents, etc. Eye blinks are imposed on facial landmarks and the person-independent landmarks are retargeted to person-specific landmarks to preserve identity related facial structure. Facial texture is then generated from person-specific facial landmarks that helps to preserve identity-related texture.

    WEAKLY SUPERVISED LEARNING OF 3D HUMAN POSES FROM 2D POSES

    公开(公告)号:US20200342270A1

    公开(公告)日:2020-10-29

    申请号:US16815206

    申请日:2020-03-11

    Abstract: Estimating 3D human pose from monocular images is a challenging problem due to the variety and complexity of human poses and the inherent ambiguity in recovering depth from single view. Recent deep learning based methods show promising results by using supervised learning on 3D pose annotated datasets. However, the lack of large-scale 3D annotated training data makes the 3D pose estimation difficult in-the-wild. Embodiments of the present disclosure provide a method which can effectively predict 3D human poses from only 2D pose in a weakly-supervised manner by using both ground-truth 3D pose and ground-truth 2D pose based on re-projection error minimization as a constraint to predict the 3D joint locations. The method may further utilize additional geometric constraints on reconstructed body parts to regularize the pose in 3D along with minimizing re-projection error to improvise on estimating an accurate 3D pose.

    METHOD AND SYSTEM FOR PREDICTION OF CORRECT DISCRETE SENSOR DATA BASED ON TEMPORAL UNCERTAINTY

    公开(公告)号:US20200210265A1

    公开(公告)日:2020-07-02

    申请号:US16728528

    申请日:2019-12-27

    Abstract: This disclosure relates generally to a method and system for prediction of correct discrete sensor data, thus enabling continuous flow of data even when a discrete sensor fails. The activities of humans/subjects, housed in a smart environment is continuously monitored by plurality of non-intrusive discrete sensors embedded in living infrastructure. The collected discrete sensor data is usually sparse and largely unbalanced, wherein most of the discrete sensor data is ‘No’ and comparatively only a few samples of ‘Yes’, hence making prediction very challenging. The proposed prediction techniques based on introduction of temporal uncertainty is performed in several stages which includes pre-processing of received discrete sensor data, introduction of temporal uncertainty techniques followed by prediction based on neural network techniques of learning pattern using historical data.

Patent Agency Ranking