-
11.
公开(公告)号:US20210150757A1
公开(公告)日:2021-05-20
申请号:US16690015
申请日:2019-11-20
Applicant: NVIDIA Corporation
Inventor: Siva Karthik Mustikovela , Varun Jampani , Shalini De Mello , Sifei Liu , Umar Iqbal , Jan Kautz
Abstract: Apparatuses, systems, and techniques to identify orientations of objects within images. In at least one embodiment, one or more neural networks are trained to identify an orientations of one or more objects based, at least in part, on one or more characteristics of the object other than the object's orientation.
-
公开(公告)号:US20190095791A1
公开(公告)日:2019-03-28
申请号:US16134716
申请日:2018-09-18
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Shalini De Mello , Jinwei Gu , Ming-Hsuan Yang , Jan Kautz
Abstract: A spatial linear propagation network (SLPN) system learns the affinity matrix for vision tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The SLPN system is trained for a particular computer vision task and refines an input map (i.e., affinity matrix) that indicates pixels the share a particular property (e.g., color, object, texture, shape, etc.). Inputs to the SLPN system are input data (e.g., pixel values for an image) and the input map corresponding to the input data to be propagated. The input data is processed to produce task-specific affinity values (guidance data). The task-specific affinity values are applied to values in the input map, with at least two weighted values from each column contributing to a value in the refined map data for the adjacent column.
-
公开(公告)号:US20230290038A1
公开(公告)日:2023-09-14
申请号:US18320446
申请日:2023-05-19
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
CPC classification number: G06T15/04 , G06T7/579 , G06T7/70 , G06T17/20 , G06T15/20 , G06T2207/30244 , G06T2207/20084 , G06T2207/10016
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US20230252692A1
公开(公告)日:2023-08-10
申请号:US17929182
申请日:2022-09-01
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Jiteng Mu , Shalini De Mello , Zhiding Yu , Jan Kautz
CPC classification number: G06T11/001 , G06T3/0093
Abstract: Embodiments of the present disclosure relate to learning dense correspondences for images. Systems and methods are disclosed that disentangle structure and texture (or style) representations of GAN synthesized images by learning a dense pixel-level correspondence map for each image during image synthesis. A canonical coordinate frame is defined and a structure latent code for each generated image is warped to align with the canonical coordinate frame. In sum, the structure associated with the latent code is mapped into a shared coordinate space (canonical coordinate space), thereby establishing correspondences in the shared coordinate space. A correspondence generation system receives the warped coordinate correspondences as an encoded image structure. The encoded image structure and a texture latent code are used to synthesize an image. The shared coordinate space enables propagation of semantic labels from reference images to synthesized images.
-
公开(公告)号:US20230177810A1
公开(公告)日:2023-06-08
申请号:US17853631
申请日:2022-06-29
Applicant: NVIDIA Corporation
Inventor: Jiarui Xu , Shalini De Mello , Sifei Liu , Wonmin Byeon , Thomas Breuel , Jan Kautz
IPC: G06V10/774 , G06V10/26
CPC classification number: G06V10/774 , G06V10/26
Abstract: Semantic segmentation includes the task of providing pixel-wise annotations for a provided image. To train a machine learning environment to perform semantic segmentation, image/caption pairs are retrieved from one or more databases. These image/caption pairs each include an image and associated textual caption. The image portion of each image/caption pair is passed to an image encoder of the machine learning environment that outputs potential pixel groupings (e.g., potential segments of pixels) within each image, while nouns are extracted from the caption portion and are converted to text prompts which are then passed to a text encoder that outputs a corresponding text representation. Contrastive loss operations are then performed on features extracted from these pixel groupings and text representations to determine an extracted feature for each noun of each caption that most closely matches the extracted features for the associated image.
-
公开(公告)号:US20230074706A1
公开(公告)日:2023-03-09
申请号:US17412091
申请日:2021-08-25
Applicant: NVIDIA Corporation
Inventor: Taihong Xiao , Sifei Liu , Shalini De Mello , Zhiding Yu , Jan Kautz
Abstract: A multi-level contrastive training strategy for training a neural network relies on image pairs (no other labels) to learn semantic correspondences at the image level and region or pixel level. The neural network is trained using contrasting image pairs including different objects and corresponding image pairs including different views of the same object. Conceptually, contrastive training pulls corresponding image pairs closer and pushes contrasting image pairs apart. An image-level contrastive loss is computed from the outputs (predictions) of the neural network and used to update parameters (weights) of the neural network via backpropagation. The neural network is also trained via pixel-level contrastive learning using only image pairs. Pixel-level contrastive learning receives an image pair, where each image includes an object in a particular category.
-
公开(公告)号:US20220396289A1
公开(公告)日:2022-12-15
申请号:US17348604
申请日:2021-06-15
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Shalini De Mello , Jan Kautz
Abstract: Apparatuses, systems, and techniques to calculate a plurality of paths, through which an autonomous device is to traverse. In at least one embodiment, a plurality of paths are calculated using one or more neural networks based, at least in part, on one or more distance values output by the one or more neural networks.
-
公开(公告)号:US11354847B2
公开(公告)日:2022-06-07
申请号:US16945455
申请日:2020-07-31
Applicant: NVIDIA Corporation
Inventor: Xueting Li , Sifei Liu , Kihwan Kim , Shalini De Mello , Jan Kautz
Abstract: A three-dimensional (3D) object reconstruction neural network system learns to predict a 3D shape representation of an object from a video that includes the object. The 3D reconstruction technique may be used for content creation, such as generation of 3D characters for games, movies, and 3D printing. When 3D characters are generated from video, the content may also include motion of the character, as predicted based on the video. The 3D object construction technique exploits temporal consistency to reconstruct a dynamic 3D representation of the object from an unlabeled video. Specifically, an object in a video has a consistent shape and consistent texture across multiple frames. Texture, base shape, and part correspondence invariance constraints may be applied to fine-tune the neural network system. The reconstruction technique generalizes well—particularly for non-rigid objects.
-
公开(公告)号:US11328173B2
公开(公告)日:2022-05-10
申请号:US17081805
申请日:2020-10-27
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Shalini De Mello , Jinwei Gu , Varun Jampani , Jan Kautz
Abstract: A temporal propagation network (TPN) system learns the affinity matrix for video image processing tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The TPN system includes a guidance neural network model and a temporal propagation module and is trained for a particular computer vision task to propagate visual properties from a key-frame represented by dense data (color), to another frame that is represented by coarse data (grey-scale). The guidance neural network model generates an affinity matrix referred to as a global transformation matrix from task-specific data for the key-frame and the other frame. The temporal propagation module applies the global transformation matrix to the key-frame property data to produce propagated property data (color) for the other frame. For example, the TPN system may be used to colorize several frames of greyscale video using a single manually colorized key-frame.
-
公开(公告)号:US20200252600A1
公开(公告)日:2020-08-06
申请号:US16780738
申请日:2020-02-03
Applicant: NVIDIA Corporation
Inventor: Hung-Yu Tseng , Shalini De Mello , Jonathan Tremblay , Sifei Liu , Jan Kautz , Stanley Thomas Birchfield
IPC: H04N13/282 , H04N13/268 , G06N3/08 , G06K9/62
Abstract: When an image is projected from 3D, the viewpoint of objects in the image, relative to the camera, must be determined. Since the image itself will not have sufficient information to determine the viewpoint of the various objects in the image, techniques to estimate the viewpoint must be employed. To date, neural networks have been used to infer such viewpoint estimates on an object category basis, but must first be trained with numerous examples that have been manually created. The present disclosure provides a neural network that is trained to learn, from just a few example images, a unique viewpoint estimation network capable of inferring viewpoint estimations for a new object category.
-
-
-
-
-
-
-
-
-