-
1.
公开(公告)号:US12266144B2
公开(公告)日:2025-04-01
申请号:US16690015
申请日:2019-11-20
Applicant: NVIDIA Corporation
Inventor: Siva Karthik Mustikovela , Varun Jampani , Shalini De Mello , Sifei Liu , Umar Iqbal , Jan Kautz
IPC: G06V10/24 , G06F18/21 , G06F18/214 , G06N3/045 , G06N3/08 , G06T7/73 , G06V10/44 , G06V10/764 , G06V10/778 , G06V10/82 , G06V20/56
Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.
-
公开(公告)号:US12182940B2
公开(公告)日:2024-12-31
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
IPC: G06T15/00 , G06F18/21 , G06T7/40 , G06T7/73 , G06T17/20 , G06V10/26 , G06V10/776 , G06V10/82 , G06V20/64
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US20240127041A1
公开(公告)日:2024-04-18
申请号:US18452714
申请日:2023-08-21
Applicant: NVIDIA Corporation
Inventor: Jimmy Smith , Wonmin Byeon , Shalini De Mello
IPC: G06N3/0464 , G06F17/16 , G06N3/049
CPC classification number: G06N3/0464 , G06F17/16 , G06N3/049
Abstract: Systems and methods are disclosed related to a convolutional structured state space model (ConvSSM), which has a tensor-structured state but a continuous-time parameterization and linear state updates. The linearity may be exploited to use parallel scans for subquadratic parallelization across the spatiotemporal sequence. The ConvSSM effectively models long-range dependencies and, when followed by a nonlinear operation forms a spatiotemporal layer (ConvS5) that does not require compressing frames into tokens, can be efficiently parallelized across the sequence, provides an unbounded context, and enables fast autoregressive generation.
-
公开(公告)号:US11704857B2
公开(公告)日:2023-07-18
申请号:US17734244
申请日:2022-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T15/20 , G06T17/20 , G06T2207/10016 , G06T2207/20084 , G06T2207/30244
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US20230144458A1
公开(公告)日:2023-05-11
申请号:US18051209
申请日:2022-10-31
Applicant: NVIDIA Corporation
Inventor: Alexander Malafeev , Shalini De Mello , Jaewoo Seo , Umar Iqbal , Koki Nagano , Jan Kautz , Simon Yuen
CPC classification number: G06V40/174 , G06V40/171 , G06V40/165 , G06V10/82 , G06T13/40
Abstract: In examples, locations of facial landmarks may be applied to one or more machine learning models (MLMs) to generate output data indicating profiles corresponding to facial expressions, such as facial action coding system (FACS) values. The output data may be used to determine geometry of a model. For example, video frames depicting one or more faces may be analyzed to determine the locations. The facial landmarks may be normalized, then be applied to the MLM(s) to infer the profile(s), which may then be used to animate the mode for expression retargeting from the video. The MLM(s) may include sub-networks that each analyze a set of input data corresponding to a region of the face to determine profiles that correspond to the region. The profiles from the sub-networks, along global locations of facial landmarks may be used by a subsequent network to infer the profiles for the overall face.
-
公开(公告)号:US20230081641A1
公开(公告)日:2023-03-16
申请号:US17551046
申请日:2021-12-14
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Eric Ryan Chan , Sameh Khamis , Shalini De Mello , Tero Tapani Karras , Orazio Gallo , Jonathan Tremblay
Abstract: A single two-dimensional (2D) image can be used as input to obtain a three-dimensional (3D) representation of the 2D image. This is done by extracting features from the 2D image by an encoder and determining a 3D representation of the 2D image utilizing a trained 2D convolutional neural network (CNN). Volumetric rendering is then run on the 3D representation to combine features within one or more viewing directions, and the combined features are provided as input to a multilayer perceptron (MLP) that predicts and outputs color (or multi-dimensional neural features) and density values for each point within the 3D representation. As a result, single-image inverse rendering may be performed using only a single 2D image as input to create a corresponding 3D representation of the scene in the single 2D image.
-
公开(公告)号:US11321865B1
公开(公告)日:2022-05-03
申请号:US16355481
申请日:2019-03-15
Applicant: NVIDIA CORPORATION
Inventor: Joohwan Kim , Michael Stengel , Zander Majercik , Shalini De Mello , Samuli Laine , Morgan McGuire , David Luebke
Abstract: One embodiment of a method includes calculating one or more activation values of one or more neural networks trained to infer eye gaze information based, at least in part, on eye position of one or more images of one or more faces indicated by an infrared light reflection from the one or more images.
-
公开(公告)号:US10762425B2
公开(公告)日:2020-09-01
申请号:US16134716
申请日:2018-09-18
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Shalini De Mello , Jinwei Gu , Ming-Hsuan Yang , Jan Kautz
Abstract: A spatial linear propagation network (SLPN) system learns the affinity matrix for vision tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The SLPN system is trained for a particular computer vision task and refines an input map (i.e., affinity matrix) that indicates pixels the share a particular property (e.g., color, object, texture, shape, etc.). Inputs to the SLPN system are input data (e.g., pixel values for an image) and the input map corresponding to the input data to be propagated. The input data is processed to produce task-specific affinity values (guidance data). The task-specific affinity values are applied to values in the input map, with at least two weighted values from each column contributing to a value in the refined map data for the adjacent column.
-
公开(公告)号:US20240404174A1
公开(公告)日:2024-12-05
申请号:US18653723
申请日:2024-05-02
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Shalini De Mello , Sifei Liu , Koki Nagano , Umar Iqbal , Jan Kautz
Abstract: Systems and methods are disclosed that animate a source portrait image with motion (i.e., pose and expression) from a target image. In contrast to conventional systems, given an unseen single-view portrait image, an implicit three-dimensional (3D) head avatar is constructed that not only captures photo-realistic details within and beyond the face region, but also is readily available for animation without requiring further optimization during inference. In an embodiment, three processing branches of a system produce three tri-planes representing coarse 3D geometry for the head avatar, detailed appearance of a source image, as well as the expression of a target image. By applying volumetric rendering to a combination of the three tri-planes, an image of the desired identity, expression and pose is generated.
-
公开(公告)号:US20240127075A1
公开(公告)日:2024-04-18
申请号:US18212629
申请日:2023-06-21
Applicant: NVIDIA Corporation
Inventor: Shalini De Mello , Christian Jacobsen , Xunlei Wu , Stephen Tyree , Alice Li , Wonmin Byeon , Shangru Li
IPC: G06N3/0985
CPC classification number: G06N3/0985
Abstract: Machine learning is a process that learns a model from a given dataset, where the model can then be used to make a prediction about new data. In order to reduce the costs associated with collecting and labeling real world datasets for use in training the model, computer processes can synthetically generate datasets which simulate real world data. The present disclosure improves the effectiveness of such synthetic datasets for training machine learning models used in real world applications, in particular by generating a synthetic dataset that is specifically targeted to a specified downstream task (e.g. a particular computer vision task, a particular natural language processing task, etc.).
-
-
-
-
-
-
-
-
-