THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

    公开(公告)号:US20190278983A1

    公开(公告)日:2019-09-12

    申请号:US16290643

    申请日:2019-03-01

    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

    ESTIMATING FACIAL EXPRESSIONS USING FACIAL LANDMARKS

    公开(公告)号:US20230144458A1

    公开(公告)日:2023-05-11

    申请号:US18051209

    申请日:2022-10-31

    CPC classification number: G06V40/174 G06V40/171 G06V40/165 G06V10/82 G06T13/40

    Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.

    3D human body pose estimation using a model trained from unlabeled multi-view data

    公开(公告)号:US11417011B2

    公开(公告)日:2022-08-16

    申请号:US16897057

    申请日:2020-06-09

    Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.

    THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

    公开(公告)号:US20210117661A1

    公开(公告)日:2021-04-22

    申请号:US17135697

    申请日:2020-12-28

    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

    Articulated body mesh estimation using three-dimensional (3D) body keypoints

    公开(公告)号:US11361507B1

    公开(公告)日:2022-06-14

    申请号:US17315060

    申请日:2021-05-07

    Abstract: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.

    3D HUMAN BODY POSE ESTIMATION USING A MODEL TRAINED FROM UNLABELED MULTI-VIEW DATA

    公开(公告)号:US20210248772A1

    公开(公告)日:2021-08-12

    申请号:US16897057

    申请日:2020-06-09

    Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.

    IMAGE-BASED THREE-DIMENSIONAL OCCUPANT ASSESSMENT FOR IN-CABIN MONITORING SYSTEMS AND APPLICATIONS

    公开(公告)号:US20250022290A1

    公开(公告)日:2025-01-16

    申请号:US18349853

    申请日:2023-07-10

    Abstract: In various examples, image-based three-dimensional occupant assessment for in-cabin monitoring systems and applications are provided. An evaluation function may determine a 3D representation of an occupant of a machine by evaluating sensor data comprising an image frame from an optical image sensor. The 3D representation may comprise at least one characteristic representative of a size of the occupant, (e.g., a 3D pose and/or 3D shape), which may be used to derive other characteristics such as, but not limited to weight, height, and/or age. A first processing path may generate a representation of one or more features corresponding to at least a portion of the occupant based on optical image data, and a second processing path may determine a depth corresponding to the one or more features based on depth data derived from the optical image data and ground truth depth data corresponding to the interior of the machine.

    NEURAL HEAD AVATAR CONSTRUCTION FROM AN IMAGE

    公开(公告)号:US20240404174A1

    公开(公告)日:2024-12-05

    申请号:US18653723

    申请日:2024-05-02

    Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.

Patent Agency Ranking