-
公开(公告)号:US20230177810A1
公开(公告)日:2023-06-08
申请号:US17853631
申请日:2022-06-29
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz
IPC: G06V10/774 , G06V10/26
CPC classification number: G06V10/774 , G06V10/26
Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
-
公开(公告)号:US20230088912A1
公开(公告)日:2023-03-23
申请号:US17952866
申请日:2022-09-26
Applicant: NVIDIA Corporation
Inventor: Ruben Villegas , Alejandro Troccoli , Iuri Frosio , Stephen Tyree , Wonmin Byeon , Jan Kautz
Abstract: In various examples, historical trajectory information of objects in an environment may be tracked by an ego-vehicle and encoded into a state feature. The encoded state features for each of the objects observed by the ego-vehicle may be used—e.g., by a bi-directional long short-term memory (LSTM) network—to encode a spatial feature. The encoded spatial feature and the encoded state feature for an object may be used to predict lateral and/or longitudinal maneuvers for the object, and the combination of this information may be used to determine future locations of the object. The future locations may be used by the ego-vehicle to determine a path through the environment, or may be used by a simulation system to control virtual objects—according to trajectories determined from the future locations—through a simulation environment.
-
公开(公告)号:US20230080247A1
公开(公告)日:2023-03-16
申请号:US17551005
申请日:2021-12-14
Applicant: NVIDIA Corporation
Inventor: Hongxu Yin , Huanrui Yang , Pavlo Molchanov , Jan Kautz
Abstract: A vision transformer is a deep learning model used to perform vision processing tasks such as image recognition. Vision transformers are currently designed with a plurality of same-size blocks that perform the vision processing tasks. However, some portions of these blocks are unnecessary and not only slow down the vision transformer but use more memory than required. In response, parameters of these blocks are analyzed to determine a score for each parameter, and if the score falls below a threshold, the parameter is removed from the associated block. This reduces a size of the resulting vision transformer, which reduces unnecessary memory usage and increases performance.
-
公开(公告)号:US20230074706A1
公开(公告)日:2023-03-09
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US11546568B1
公开(公告)日:2023-01-03
申请号:US16811356
申请日:2020-03-06
Applicant: Nvidia Corporation
Inventor: Jae Shin Yoon , Jan Kautz , Kihwan Kim
IPC: H04N13/128 , H04N13/00
Abstract: Apparatuses, systems, and techniques are presented to perform monocular view synthesis of a dynamic scene. Single and multi-view depth information can be determined for a collection of images of a dynamic scene, and a blender network can be used to combine image features for foreground, background, and missing image regions using fused depth maps inferred form the single and multi-view depth information.
-
公开(公告)号:US20220396289A1
公开(公告)日:2022-12-15
申请号:US17348604
申请日:2021-06-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Shalini De Mello , Jan Kautz
Abstract: Apparatuses, systems, and techniques to calculate a plurality of paths, through which an autonomous device is to traverse. In at least one embodiment, a plurality of paths are calculated using one or more neural networks based, at least in part, on one or more distance values output by the one or more neural networks.
-
公开(公告)号:US11367268B2
公开(公告)日:2022-06-21
申请号:US16998890
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Yang Zou , Zhiding Yu , Jan Kautz
Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class. The identification-related features may then be used to train a neural network to perform re-identification of objects in that object class from images captured from the second domain.
-
公开(公告)号:US20220191638A1
公开(公告)日:2022-06-16
申请号:US17191313
申请日:2021-03-03
Applicant: NVIDIA Corporation
Inventor: Michael Stengel , Jan Kautz , David Patrick Luebke , Morgan Samuel McGuire
Abstract: Apparatuses, systems, and techniques to determine head poses of users and provide audio for the users. In at least one embodiment, a head pose is determined based, at least in part, on camera frame information, and an audio signal is generated, based at least in part, on the determined head pose.
-
公开(公告)号:US11361507B1
公开(公告)日:2022-06-14
申请号:US17315060
申请日:2021-05-07
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Jan Kautz , Yun Rong Guo , Cheng Xie
Abstract: Estimating a three-dimensional (3D) pose and shape of an articulated body mesh is useful for many different applications including health and fitness, entertainment, and computer graphics. A set of estimated 3D keypoint positions for a human body structure are processed to compute parameters defining the pose and shape of a parametric human body mesh using a set of geometric operations. During processing, 3D keypoints are extracted from the parametric human body mesh and a set of rotations are computed to align the extracted 3D keypoints with the estimated 3D keypoints. The set of rotations may correctly position a particular 3D keypoint location at a “joint”, but an arbitrary number of rotations of the “joint” keypoint may produce a twist in a connection to a child keypoint. Rules are applied to the set of rotations to resolve ambiguous twists and articulate the parametric human body mesh according to the computed parameters.
-
公开(公告)号:US11354847B2
公开(公告)日:2022-06-07
申请号:US16945455
申请日:2020-07-31
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
-
-
-
-
-
-
-
-