-
公开(公告)号:US12169882B2
公开(公告)日:2024-12-17
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US20240153093A1
公开(公告)日:2024-05-09
申请号:US18310414
申请日:2023-05-01
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon
CPC classification number: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084
Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.
-
公开(公告)号:US11960570B2
公开(公告)日:2024-04-16
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
IPC: G06F18/00 , G06F18/213 , G06F18/214 , G06N3/08 , G06V10/22 , G06V30/14
CPC classification number: G06F18/2155 , G06F18/213 , G06N3/08 , G06V10/22 , G06V30/1444
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US11907846B2
公开(公告)日:2024-02-20
申请号:US17017597
申请日:2020-09-10
Applicant: NVIDIA CORPORATION
Inventor: Sifei Liu , Shalini De Mello , Varun Jampani , Jan Kautz , Xueting Li
IPC: G06K9/36 , G06N3/084 , G06F18/22 , G06F18/20 , G06F18/214 , G06F18/21 , G06N3/045 , G06T17/00 , G06V10/82
CPC classification number: G06N3/084 , G06F18/214 , G06F18/2163 , G06F18/22 , G06F18/29 , G06N3/045 , G06T17/00 , G06V10/82
Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.
-
公开(公告)号:US11375176B2
公开(公告)日:2022-06-28
申请号:US16780738
申请日:2020-02-03
Applicant: NVIDIA Corporation
Inventor: Hung-Yu Tseng , Shalini De Mello , Jonathan Tremblay , Sifei Liu , Jan Kautz , Stanley Thomas Birchfield
IPC: H04N13/282 , H04N13/268 , G06K9/62 , G06N3/08
Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.
-
公开(公告)号:US20220139037A1
公开(公告)日:2022-05-05
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US11238650B2
公开(公告)日:2022-02-01
申请号:US16849962
申请日:2020-04-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US20210287430A1
公开(公告)日:2021-09-16
申请号:US16849962
申请日:2020-04-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US20210064931A1
公开(公告)日:2021-03-04
申请号:US16998914
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Xitong Yang , Sifei Liu , Jan Kautz
Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.
-
公开(公告)号:US20200320401A1
公开(公告)日:2020-10-08
申请号:US16378464
申请日:2019-04-08
Applicant: NVIDIA Corporation
Inventor: Varun Jampani , Wei-Chih Hung , Sifei Liu , Pavlo Molchanov , Jan Kautz
Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
-
-
-
-
-
-
-
-
-