-
公开(公告)号:US20250045892A1
公开(公告)日:2025-02-06
申请号:US18593742
申请日:2024-03-01
Applicant: NVIDIA Corporation
Inventor: Morteza Mardani , Jiaming Song , Jan Kautz , Arash Vahdat
Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. For example, they can be trained in the image domain, for example, to perform specific image restoration tasks, such as inpainting (e.g. completing an incomplete image), deblurring (e.g. removing blurring from an image), and super-resolution (e.g. increasing a resolution of an image), or they can be trained to perform image rendering tasks, including 2D-to-3D image generation tasks. However, current approaches to training diffusion models only allow the models to be optimized for a specific task such that they will not achieve high-quality results when used for other tasks. The present disclosure provides a diffusion model that uses variational inferencing to approximate a distribution of data, which allows the diffusion model to universally solve different tasks without having to be re-trained specifically for each task.
-
102.
公开(公告)号:US20240371096A1
公开(公告)日:2024-11-07
申请号:US18312102
申请日:2023-05-04
Applicant: Nvidia Corporation
Inventor: Sameh Khamis , Koki Nagano , Jan Kautz , Sanja Fidler
Abstract: Approaches presented herein provide systems and methods for disentangling identity from expression input models. One or more machine learning systems may be trained directly from three-dimensional (3D) points to develop unique latent codes for expressions associated with different identities. These codes may then be mapped to different identities to independently model an object, such as a face, to generate a new mesh including an expression for an independent identity. A pipeline may include a set of machine learning systems to determine model parameters and also adjust input expression codes using gradient backpropagation in order train models for incorporation into a content development pipeline.
-
公开(公告)号:US20240169563A1
公开(公告)日:2024-05-23
申请号:US18509627
申请日:2023-11-15
Applicant: NVIDIA Corporation
Inventor: Bowen Wen , Jonathan Tremblay , Valts Blukis , Jan Kautz , Stanley Thomas Birchfield
CPC classification number: G06T7/248 , G06T7/11 , G06T7/70 , G06T17/00 , G06T19/006 , G06T2207/10016 , G06T2207/10024 , G06T2207/10028 , G06T2207/20072 , G06T2207/20084 , G06T2207/30252
Abstract: Apparatuses, systems, and techniques for constructing a data structure to store a shape of an object based at least in part on a portion of multiple images, and obtaining poses of the object by tracking a pose of the object through the multiple images based at least in part on the data structure. Optionally, the poses may be used to generate a plan for a path of a device to travel, generate a rendering of at least a portion of a Mixed Reality (“MR”) display to be viewed by a user, and/or the like.
-
公开(公告)号:US11948078B2
公开(公告)日:2024-04-02
申请号:US17000048
申请日:2020-08-21
Applicant: Nvidia Corporation
Inventor: Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz
IPC: G06N3/08 , G06F18/214 , G06F18/22 , G06V10/74 , G06V10/82 , G06V30/19 , G06V30/262
CPC classification number: G06N3/08 , G06F18/2148 , G06F18/22 , G06V10/761 , G06V10/82 , G06V30/1916 , G06V30/19173 , G06V30/274
Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
-
公开(公告)号:US20230394781A1
公开(公告)日:2023-12-07
申请号:US18083397
申请日:2022-12-16
Applicant: NVIDIA Corporation
Inventor: Ali Hatamizadeh , Hongxu Yin , Jan Kautz , Pavlo Molchanov
CPC classification number: G06V10/42 , G06V10/44 , G06V10/82 , G06T3/40 , G06V10/7715
Abstract: Vision transformers are deep learning models that employ a self-attention mechanism to obtain feature representations for an input image. To date, the configuration of vision transformers has limited the self-attention computation to a local window of the input image, such that short-range dependencies are modeled in the output. The present disclosure provides a vision transformer that captures global context, and that is therefore able to model long-range dependencies in its output.
-
公开(公告)号:US20230290038A1
公开(公告)日:2023-09-14
申请号:US18320446
申请日:2023-05-19
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20 , G06T2207/30244 , G06T2207/20084 , G06T2207/10016
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US20230252692A1
公开(公告)日:2023-08-10
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
CPC classification number: G06T11/001 , G06T3/0093
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US20230177810A1
公开(公告)日:2023-06-08
申请号:US17853631
申请日:2022-06-29
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz
IPC: G06V10/774 , G06V10/26
CPC classification number: G06V10/774 , G06V10/26
Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
-
公开(公告)号:US20230088912A1
公开(公告)日:2023-03-23
申请号:US17952866
申请日:2022-09-26
Applicant: NVIDIA Corporation
Inventor: Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz
Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
-
公开(公告)号:US20230080247A1
公开(公告)日:2023-03-16
申请号:US17551005
申请日:2021-12-14
Applicant: NVIDIA Corporation
Inventor: Hongxu Yin , Huanrui Yang , Pavlo Molchanov , Jan Kautz
Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.
-
-
-
-
-
-
-
-
-