-
公开(公告)号:US20240153093A1
公开(公告)日:2024-05-09
申请号:US18310414
申请日:2023-05-01
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon
CPC classification number: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084
Abstract: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.
-
公开(公告)号:US20240135630A1
公开(公告)日:2024-04-25
申请号:US18485225
申请日:2023-10-11
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Eric Ryan Wong Chan , Tero Tapani Karras , Shalini De Mello , Miika Samuli Aittala , Matthew Aaron Wong Chan
IPC: G06T15/06 , G06T5/00 , G06T5/50 , G06V10/44 , G06V10/771
CPC classification number: G06T15/06 , G06T5/002 , G06T5/50 , G06V10/44 , G06V10/771 , G06T2207/20084 , G06T2207/20221
Abstract: A method and system for performing novel image synthesis using generative networks are provided. The encoder-based model is trained to infer a 3D representation of an input image. A feature image is then generated using volume rendering techniques in accordance with the 3D representation. The feature image is then concatenated with a noisy image and processed by a denoiser network to predict an output image from a novel viewpoint that is consistent with the input image. The denoiser network can be a modified Noise Conditional Score Network (NCSN). In some embodiments, multiple input images or keyframes can be provided as input, and a different 3D representation is generated for each input image. The feature image is then generated, during volume rendering, by sampling each of the 3D representations and applying a mean-pooling operation to generate an aggregate feature image.
-
公开(公告)号:US11960570B2
公开(公告)日:2024-04-16
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
IPC: G06F18/00 , G06F18/213 , G06F18/214 , G06N3/08 , G06V10/22 , G06V30/14
CPC classification number: G06F18/2155 , G06F18/213 , G06N3/08 , G06V10/22 , G06V30/1444
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US11907846B2
公开(公告)日:2024-02-20
申请号:US17017597
申请日:2020-09-10
Applicant: NVIDIA CORPORATION
Inventor: Sifei Liu , Shalini De Mello , Varun Jampani , Jan Kautz , Xueting Li
IPC: G06K9/36 , G06N3/084 , G06F18/22 , G06F18/20 , G06F18/214 , G06F18/21 , G06N3/045 , G06T17/00 , G06V10/82
CPC classification number: G06N3/084 , G06F18/214 , G06F18/2163 , G06F18/22 , G06F18/29 , G06N3/045 , G06T17/00 , G06V10/82
Abstract: One embodiment of the present invention sets forth a technique for performing spatial propagation. The technique includes generating a first directed acyclic graph (DAG) by connecting spatially adjacent points included in a set of unstructured points via directed edges along a first direction. The technique also includes applying a first set of neural network layers to one or more images associated with the set of unstructured points to generate (i) a set of features for the set of unstructured points and (ii) a set of pairwise affinities between the spatially adjacent points connected by the directed edges. The technique further includes generating a set of labels for the set of unstructured points by propagating the set of features across the first DAG based on the set of pairwise affinities.
-
公开(公告)号:US11830145B2
公开(公告)日:2023-11-28
申请号:US17479866
申请日:2021-09-20
Applicant: NVIDIA Corporation
Inventor: Kunal Gupta , Shalini De Mello , Charles Loop , Jonathan Tremblay , Stanley Thomas Birchfield
IPC: G06T17/20 , G06N3/08 , G06F18/214
CPC classification number: G06T17/205 , G06F18/214 , G06N3/08
Abstract: A manifold voxel mesh or surface mesh is manufacturable by carving a single block of material and a non-manifold mesh is not manufacturable. Conventional techniques for constructing or extracting a surface mesh from an input point cloud often produce a non-manifold voxel mesh. Similarly, extracting a surface mesh from a voxel mesh that includes non-manifold geometry produces a surface mesh that includes non-manifold geometry. To ensure that the surface mesh includes only manifold geometry, locations of the non-manifold geometry in the voxel mesh are detected and converted into manifold geometry. The result is a manifold voxel mesh from which a manifold surface mesh of the object may be extracted.
-
公开(公告)号:US20220254029A1
公开(公告)日:2022-08-11
申请号:US17500338
申请日:2021-10-13
Applicant: NVIDIA Corporation
Inventor: Eugene Vorontsov , Wonmin Byeon , Shalini De Mello , Varun Jampani , Ming-Yu Liu , Pavlo Molchanov
Abstract: The neural network includes an encoder, a common decoder, and a residual decoder. The encoder encodes input images into a latent space. The latent space disentangles unique features from other common features. The common decoder decodes common features resident in the latent space to generate translated images which lack the unique features. The residual decoder decodes unique features resident in the latent space to generate image deltas corresponding to the unique features. The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features. The combined images can be used to drive autoencoding. Once training is complete, the residual decoder can be modified to generate segmentation masks that indicate any regions of a given input image where a unique feature resides.
-
公开(公告)号:US11375176B2
公开(公告)日:2022-06-28
申请号:US16780738
申请日:2020-02-03
Applicant: NVIDIA Corporation
Inventor: Hung-Yu Tseng , Shalini De Mello , Jonathan Tremblay , Sifei Liu , Jan Kautz , Stanley Thomas Birchfield
IPC: H04N13/282 , H04N13/268 , G06K9/62 , G06N3/08
Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.
-
公开(公告)号:US20220139037A1
公开(公告)日:2022-05-05
申请号:US17578051
申请日:2022-01-18
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US11238650B2
公开(公告)日:2022-02-01
申请号:US16849962
申请日:2020-04-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
公开(公告)号:US20210287430A1
公开(公告)日:2021-09-16
申请号:US16849962
申请日:2020-04-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Varun Jampani , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object. In at least one embodiment, objects are identified in an image using one or more neural networks that have been trained on objects of a similar category and a three-dimensional mesh template.
-
-
-
-
-
-
-
-
-