Patent search ap:("NVIDIA Corporation") AND inv:"Jan Kautz" Page 12

111.

发明公开
PERFORMING SEMANTIC SEGMENTATION TRAINING WITH IMAGE/TEXT PAIRS 审中-公开

公开(公告)号：US20230177810A1

公开(公告)日：2023-06-08

申请号：US17853631

申请日：2022-06-29

Applicant: NVIDIA Corporation

Inventor： Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz

IPC: G06V10/774 , G06V10/26

CPC classification number: G06V10/774 , G06V10/26

Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.

112.

发明申请
FUTURE OBJECT TRAJECTORY PREDICTIONS FOR AUTONOMOUS MACHINE APPLICATIONS 有权

公开(公告)号：US20230088912A1

公开(公告)日：2023-03-23

申请号：US17952866

申请日：2022-09-26

Applicant: NVIDIA Corporation

Inventor： Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz

IPC: G06N3/04 , G06N3/08 , B60W40/02

Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.

113.

发明申请
PRUNING A VISION TRANSFORMER 有权

公开(公告)号：US20230080247A1

公开(公告)日：2023-03-16

申请号：US17551005

申请日：2021-12-14

Applicant: NVIDIA Corporation

Inventor： Hongxu Yin , Huanrui Yang , Pavlo Molchanov , Jan Kautz

IPC: G06V10/94 , G06V10/70

Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.

114.

发明申请
LEARNING CONTRASTIVE REPRESENTATION FOR SEMANTIC CORRESPONDENCE 有权

公开(公告)号：US20230074706A1

公开(公告)日：2023-03-09

申请号：US17412091

申请日：2021-08-25

Applicant: NVIDIA Corporation

Inventor： Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz

IPC: G06K9/62 , G06K9/20 , G06N3/08

Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.

115.

发明授权
View synthesis for dynamic scenes 有权

公开(公告)号：US11546568B1

公开(公告)日：2023-01-03

申请号：US16811356

申请日：2020-03-06

Applicant: Nvidia Corporation

Inventor： Jae Shin Yoon , Jan Kautz , Kihwan Kim

IPC: H04N13/128 , H04N13/00

Abstract: Apparatuses, systems, and techniques are presented to perform monocular view synthesis of a dynamic scene. Single and multi-view depth information can be determined for a collection of images of a dynamic scene, and a blender network can be used to combine image features for foreground, background, and missing image regions using fused depth maps inferred form the single and multi-view depth information.

116.

发明申请
NEURAL NETWORK PATH PLANNING 有权

公开(公告)号：US20220396289A1

公开(公告)日：2022-12-15

申请号：US17348604

申请日：2021-06-15

Applicant: NVIDIA Corporation

Inventor： Xueting Li , Sifei Liu , Shalini De Mello , Jan Kautz

IPC: B60W60/00 , G06N3/04 , G06N3/08 , G05D1/02 , G05D1/00

Abstract: Apparatuses, systems, and techniques to calculate a plurality of paths, through which an autonomous device is to traverse. In at least one embodiment, a plurality of paths are calculated using one or more neural networks based, at least in part, on one or more distance values output by the one or more neural networks.

117.

发明授权
Cross-domain image processing for object re-identification 有权

公开(公告)号：US11367268B2

公开(公告)日：2022-06-21

申请号：US16998890

申请日：2020-08-20

Applicant: NVIDIA Corporation

Inventor： Xiaodong Yang , Yang Zou , Zhiding Yu , Jan Kautz

IPC: G06K9/44 , G06V10/34 , G06K9/62 , G06N3/04 , G06N3/08 , G06V10/40 , G06V20/40 , G06V30/262

Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class. The identification-related features may then be used to train a neural network to perform re-identification of objects in that object class from images captured from the second domain.

118.

发明申请
VISUALLY TRACKED SPATIAL AUDIO 有权

公开(公告)号：US20220191638A1

公开(公告)日：2022-06-16

申请号：US17191313

申请日：2021-03-03

Applicant: NVIDIA Corporation

Inventor： Michael Stengel , Jan Kautz , David Patrick Luebke , Morgan Samuel McGuire

IPC: H04S7/00 , G06K9/00 , G06F3/01 , G06T7/70 , H04N7/15 , G06K9/62 , H04S3/00

Abstract: Apparatuses, systems, and techniques to determine head poses of users and provide audio for the users. In at least one embodiment, a head pose is determined based, at least in part, on camera frame information, and an audio signal is generated, based at least in part, on the determined head pose.

119.

发明授权
Articulated body mesh estimation using three-dimensional (3D) body keypoints 有权

公开(公告)号：US11361507B1

公开(公告)日：2022-06-14

申请号：US17315060

申请日：2021-05-07

Applicant: NVIDIA Corporation

Inventor： Umar Iqbal , Pavlo Molchanov , Jan Kautz , Yun Rong Guo , Cheng Xie

IPC: G06T17/20 , G06T19/20

Abstract: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.

120.

发明授权
Three-dimensional object reconstruction from a video 有权

公开(公告)号：US11354847B2

公开(公告)日：2022-06-07

申请号：US16945455

申请日：2020-07-31

Applicant: NVIDIA Corporation

Inventor： Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz

IPC: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20

Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification