-
公开(公告)号:US20190278983A1
公开(公告)日:2019-09-12
申请号:US16290643
申请日:2019-03-01
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Thomas Michael Breuel , Jan Kautz
Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
-
公开(公告)号:US20230144458A1
公开(公告)日:2023-05-11
申请号:US18051209
申请日:2022-10-31
Applicant: NVIDIA Corporation
Inventor: Alexander Malafeev , Shalini De Mello , Jaewoo Seo , Umar Iqbal , Koki Nagano , Jan Kautz , Simon Yuen
CPC classification number: G06V40/174 , G06V40/171 , G06V40/165 , G06V10/82 , G06T13/40
Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.
-
公开(公告)号:US11417011B2
公开(公告)日:2022-08-16
申请号:US16897057
申请日:2020-06-09
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Jan Kautz
Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.
-
公开(公告)号:US20210117661A1
公开(公告)日:2021-04-22
申请号:US17135697
申请日:2020-12-28
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Thomas Michael Breuel , Jan Kautz
Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
-
公开(公告)号:US20250061729A1
公开(公告)日:2025-02-20
申请号:US18238335
申请日:2023-08-25
Applicant: NVIDIA Corporation
Inventor: Prash Goel , Umar Iqbal , Akarsh Umesh Zingade , Pavlo Molchanov
Abstract: Apparatuses, systems, and techniques to identify three-dimensional positions of partially occluded objects in images. In at least one embodiment, one or more neural networks identify the three-dimensional positions of occluded portions of objects in a first image based, at least in part, on one or more second images including non-occluded objects.
-
公开(公告)号:US11361507B1
公开(公告)日:2022-06-14
申请号:US17315060
申请日:2021-05-07
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Jan Kautz , Yun Rong Guo , Cheng Xie
Abstract: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.
-
公开(公告)号:US20210248772A1
公开(公告)日:2021-08-12
申请号:US16897057
申请日:2020-06-09
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Jan Kautz
Abstract: Learning to estimate a 3D body pose, and likewise the pose of any type of object, from a single 2D image is of great interest for many practical graphics applications and generally relies on neural networks that have been trained with sample data which annotates (labels) each sample 2D image with a known 3D pose. Requiring this labeled training data however has various drawbacks, including for example that traditionally used training data sets lack diversity and therefore limit the extent to which neural networks are able to estimate 3D pose. Expanding these training data sets is also difficult since it requires manually provided annotations for 2D images, which is time consuming and prone to errors. The present disclosure overcomes these and other limitations of existing techniques by providing a model that is trained from unlabeled multi-view data for use in 3D pose estimation.
-
8.
公开(公告)号:US20250022290A1
公开(公告)日:2025-01-16
申请号:US18349853
申请日:2023-07-10
Applicant: NVIDIA Corporation
Inventor: Sakthivel SIVARAMAN , Arjun Guru , Rajath Shetty , Umar Iqbal , Orazio Gallo , Hang Su , Abhishek Badki , Varsha Hedau
Abstract: In various examples, image-based three-dimensional occupant assessment for in-cabin monitoring systems and applications are provided. An evaluation function may determine a 3D representation of an occupant of a machine by evaluating sensor data comprising an image frame from an optical image sensor. The 3D representation may comprise at least one characteristic representative of a size of the occupant, (e.g., a 3D pose and/or 3D shape), which may be used to derive other characteristics such as, but not limited to weight, height, and/or age. A first processing path may generate a representation of one or more features corresponding to at least a portion of the occupant based on optical image data, and a second processing path may determine a depth corresponding to the one or more features based on depth data derived from the optical image data and ground truth depth data corresponding to the interior of the machine.
-
公开(公告)号:US20240404174A1
公开(公告)日:2024-12-05
申请号:US18653723
申请日:2024-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Shalini De Mello , Sifei Liu , Koki Nagano , Umar Iqbal , Jan Kautz
Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.
-
公开(公告)号:US20240070874A1
公开(公告)日:2024-02-29
申请号:US18135654
申请日:2023-04-17
Applicant: NVIDIA Corporation
Inventor: Muhammed Kocabas , Ye Yuan , Umar Iqbal , Pavlo Molchanov , Jan Kautz
CPC classification number: G06T7/20 , G06T7/70 , G06T2207/20084 , G06T2207/30196 , G06T2207/30252 , G06T2210/12
Abstract: Estimating motion of a human or other object in video is a common computer task with applications in robotics, sports, mixed reality, etc. However, motion estimation becomes difficult when the camera capturing the video is moving, because the observed object and camera motions are entangled. The present disclosure provides for joint estimation of the motion of a camera and the motion of articulated objects captured in video by the camera.
-
-
-
-
-
-
-
-
-