Patent search ap:("NVIDIA Corporation") AND inv:"Jan Kautz" Page 11

101.

发明申请
VARIATIONAL INFERENCING BY A DIFFUSION MODEL 有权

公开(公告)号：US20250045892A1

公开(公告)日：2025-02-06

申请号：US18593742

申请日：2024-03-01

Applicant: NVIDIA Corporation

Inventor： Morteza Mardani , Jiaming Song , Jan Kautz , Arash Vahdat

IPC: G06T7/00 , G06T3/40 , G06T5/70 , G06T5/73 , G06T5/77

Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. For example, they can be trained in the image domain, for example, to perform specific image restoration tasks, such as inpainting (e.g. completing an incomplete image), deblurring (e.g. removing blurring from an image), and super-resolution (e.g. increasing a resolution of an image), or they can be trained to perform image rendering tasks, including 2D-to-3D image generation tasks. However, current approaches to training diffusion models only allow the models to be optimized for a specific task such that they will not achieve high-quality results when used for other tasks. The present disclosure provides a diffusion model that uses variational inferencing to approximate a distribution of data, which allows the diffusion model to universally solve different tasks without having to be re-trained specifically for each task.

102.

发明申请
SYNTHETIC DATA GENERATION USING MORPHABLE MODELS WITH IDENTITY AND EXPRESSION EMBEDDINGS 有权

公开(公告)号：US20240371096A1

公开(公告)日：2024-11-07

申请号：US18312102

申请日：2023-05-04

Applicant: Nvidia Corporation

Inventor： Sameh Khamis , Koki Nagano , Jan Kautz , Sanja Fidler

IPC: G06T17/20 , G06T7/70 , G06T13/40 , G06V10/56 , G06V10/82

Abstract: Approaches presented herein provide systems and methods for disentangling identity from expression input models. One or more machine learning systems may be trained directly from three-dimensional (3D) points to develop unique latent codes for expressions associated with different identities. These codes may then be mapped to different identities to independently model an object, such as a face, to generate a new mesh including an expression for an independent identity. A pipeline may include a set of machine learning systems to determine model parameters and also adjust input expression codes using gradient backpropagation in order train models for incorporation into a content development pipeline.

103.

发明公开
TRACKING AND 3D RECONSTRUCTION OF UNKNOWN OBJECTS 审中-公开

公开(公告)号：US20240169563A1

公开(公告)日：2024-05-23

申请号：US18509627

申请日：2023-11-15

Applicant: NVIDIA Corporation

Inventor： Bowen Wen , Jonathan Tremblay , Valts Blukis , Jan Kautz , Stanley Thomas Birchfield

IPC: G06T7/246 , G06T7/11 , G06T7/70 , G06T17/00 , G06T19/00

CPC classification number: G06T7/248 , G06T7/11 , G06T7/70 , G06T17/00 , G06T19/006 , G06T2207/10016 , G06T2207/10024 , G06T2207/10028 , G06T2207/20072 , G06T2207/20084 , G06T2207/30252

Abstract: Apparatuses, systems, and techniques for constructing a data structure to store a shape of an object based at least in part on a portion of multiple images, and obtaining poses of the object by tracking a pose of the object through the multiple images based at least in part on the data structure. Optionally, the poses may be used to generate a plan for a path of a device to travel, generate a rendering of at least a portion of a Mixed Reality (“MR”) display to be viewed by a user, and/or the like.

104.

发明授权
Joint representation learning from images and text 有权

公开(公告)号：US11948078B2

公开(公告)日：2024-04-02

申请号：US17000048

申请日：2020-08-21

Applicant: Nvidia Corporation

Inventor： Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz

IPC: G06N3/08 , G06F18/214 , G06F18/22 , G06V10/74 , G06V10/82 , G06V30/19 , G06V30/262

CPC classification number: G06N3/08 , G06F18/2148 , G06F18/22 , G06V10/761 , G06V10/82 , G06V30/1916 , G06V30/19173 , G06V30/274

Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.

105.

发明公开
GLOBAL CONTEXT VISION TRANSFORMER 审中-公开

公开(公告)号：US20230394781A1

公开(公告)日：2023-12-07

申请号：US18083397

申请日：2022-12-16

Applicant: NVIDIA Corporation

Inventor： Ali Hatamizadeh , Hongxu Yin , Jan Kautz , Pavlo Molchanov

IPC: G06V10/42 , G06V10/44 , G06V10/77 , G06T3/40 , G06V10/82

CPC classification number: G06V10/42 , G06V10/44 , G06V10/82 , G06T3/40 , G06V10/7715

Abstract: Vision transformers are deep learning models that employ a self-attention mechanism to obtain feature representations for an input image. To date, the configuration of vision transformers has limited the self-attention computation to a local window of the input image, such that short-range dependencies are modeled in the output. The present disclosure provides a vision transformer that captures global context, and that is therefore able to model long-range dependencies in its output.

106.

发明公开
THREE-DIMENSIONAL OBJECT RECONSTRUCTION FROM A VIDEO 审中-公开

公开(公告)号：US20230290038A1

公开(公告)日：2023-09-14

申请号：US18320446

申请日：2023-05-19

Applicant: NVIDIA Corporation

Inventor： Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz

IPC: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20

CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20 , G06T2207/30244 , G06T2207/20084 , G06T2207/10016

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

107.

发明公开
LEARNING DENSE CORRESPONDENCES FOR IMAGES 审中-公开

公开(公告)号：US20230252692A1

公开(公告)日：2023-08-10

申请号：US17929182

申请日：2022-09-01

Applicant: NVIDIA Corporation

Inventor： Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz

IPC: G06T11/00 , G06T3/00

CPC classification number: G06T11/001 , G06T3/0093

Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.

108.

发明公开
PERFORMING SEMANTIC SEGMENTATION TRAINING WITH IMAGE/TEXT PAIRS 审中-公开

公开(公告)号：US20230177810A1

公开(公告)日：2023-06-08

申请号：US17853631

申请日：2022-06-29

Applicant: NVIDIA Corporation

Inventor： Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz

IPC: G06V10/774 , G06V10/26

CPC classification number: G06V10/774 , G06V10/26

Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.

109.

发明申请
FUTURE OBJECT TRAJECTORY PREDICTIONS FOR AUTONOMOUS MACHINE APPLICATIONS 有权

公开(公告)号：US20230088912A1

公开(公告)日：2023-03-23

申请号：US17952866

申请日：2022-09-26

Applicant: NVIDIA Corporation

Inventor： Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz

IPC: G06N3/04 , G06N3/08 , B60W40/02

Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.

110.

发明申请
PRUNING A VISION TRANSFORMER 有权

公开(公告)号：US20230080247A1

公开(公告)日：2023-03-16

申请号：US17551005

申请日：2021-12-14

Applicant: NVIDIA Corporation

Inventor： Hongxu Yin , Huanrui Yang , Pavlo Molchanov , Jan Kautz

IPC: G06V10/94 , G06V10/70

Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification