-
公开(公告)号:US20240273682A1
公开(公告)日:2024-08-15
申请号:US18431527
申请日:2024-02-02
Applicant: NVIDIA Corporation
Inventor: Weili Nie , Guan-Horng Liu , Arash Vahdat , De-An Huang , Anima Anandkumar
Abstract: Image restoration generally involves recovering a target clean image from a given image having noise, blurring, or other degraded features. Current image restoration solutions typically include a diffusion model that is trained for image restoration by a forward process that progressively diffuses data to noise, and then by learning in a reverse process to generate the data from the noise. However, the forward process relies on Gaussian noise to diffuse the original data, which has little or no structural information corresponding to the original data versus learning from the degraded image itself which is much more structurally informative compared to the random Gaussian noise. Similar problems also exist for other data-to-data translation tasks. The present disclosure trains a data translation conditional diffusion model from diffusion bridge(s) computed between a first version of the data and a second version of the data, which can yield a model that can provide interpretable generation, sampling efficiency, and reduced processing time.
-
公开(公告)号:US20240249538A1
公开(公告)日:2024-07-25
申请号:US18223473
申请日:2023-07-18
Applicant: NVIDIA Corporation
Inventor: Zetong Yang , Zhiding Yu , Ren Hao Wang , Chris Choy , Anima Anandkumar , Jose M. Alvarez Lopez
CPC classification number: G06V20/64 , G06T7/50 , G06T7/70 , G06T7/80 , G06V10/225 , G06V10/82 , G06T2207/20081 , G06T2207/20084 , G06T2207/30252 , G06V2201/07
Abstract: 3D object detection is a computer vision task that generally detects (e.g. classifies and localizes) objects in 3D space from the 2D images or videos that capture the objects. Current techniques used for 3D object detection rely on machine learning processes that learn to detect 3D objects from existing images annotated with high-quality 3D information including depth information generally obtained using lidar technology. However, due to lidar's limited measurable range, current machine learning solutions to 3D object detection do not support detection of 3D objects beyond the lidar range, which is needed for numerous applications, including autonomous driving applications where existing close or midrange 3D object detection does not always meet the safety-critical requirement of autonomous driving. The present disclosure provides for 3D object detection using a technique that supports long-range detection (i.e. detection beyond the lidar range).
-
公开(公告)号:US20240104698A1
公开(公告)日:2024-03-28
申请号:US17719091
申请日:2022-04-12
Applicant: Nvidia Corporation
Inventor: Weili Nie , Yujia Huang , Chaowei Xiao , Arash Vahdat , Anima Anandkumar
CPC classification number: G06T5/002 , G06N3/0445 , G06N3/0472 , G06T5/50 , G06T2207/20084
Abstract: Apparatuses, systems, and techniques are presented to remove unintended variations introduced into data. In at least one embodiment, a first image of an object can be generated based, at least in part, upon adding noise to, and removing the noise from, a second image of the object.
-
公开(公告)号:US11931909B2
公开(公告)日:2024-03-19
申请号:US17331466
申请日:2021-05-26
Applicant: NVIDIA Corporation
Inventor: Jonathan Tremblay , Fabio Tozeto Ramos , Yuke Zhu , Anima Anandkumar , Guanya Shi
IPC: B25J9/16 , B25J13/08 , B25J19/02 , G05B13/02 , G06F18/214 , G06K9/00 , G06N3/04 , G06N3/045 , G06T7/73 , G06V10/75 , G06V20/20 , G06V20/64
CPC classification number: B25J9/1697 , B25J9/161 , B25J9/1612 , B25J13/08 , B25J19/023 , G05B13/027 , G06F18/2148 , G06N3/045 , G06T7/73 , G06V10/751 , G06V20/20 , G06V20/653 , G06T2207/20081 , G06T2207/20084
Abstract: Apparatuses, systems, and techniques generate poses of an object based on data of the object observed from a first viewpoint and a second viewpoint. The poses can be evaluated to determine a portion of the data usable by an estimator to generate a pose of the object.
-
公开(公告)号:US20240078423A1
公开(公告)日:2024-03-07
申请号:US17893026
申请日:2022-08-22
Applicant: NVIDIA Corporation
Inventor: Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar
Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.
-
公开(公告)号:US20240037756A1
公开(公告)日:2024-02-01
申请号:US18144071
申请日:2023-05-05
Applicant: NVIDIA Corporation
Inventor: De-An Huang , Zhiding Yu , Anima Anandkumar
CPC classification number: G06T7/20 , G06T5/20 , G06T7/70 , G06V10/761 , G06V10/82 , G06V2201/07 , G06T2207/20081
Abstract: Apparatuses, systems, and techniques to track one or more objects in one or more frames of a video. In at least one embodiment, one or more objects in one or more frames of a video are tracked based on, for example, one or more sets of embeddings.
-
公开(公告)号:US11790633B2
公开(公告)日:2023-10-17
申请号:US17365877
申请日:2021-07-01
Applicant: Nvidia Corporation
Inventor: Zhiding Yu , Rui Huang , Wonmin Byeon , Sifei Liu , Guilin Liu , Thomas Breuel , Anima Anandkumar , Jan Kautz
IPC: G06V10/50 , G06N3/04 , G06T7/13 , G06V10/75 , G06F18/2413
CPC classification number: G06V10/50 , G06F18/2413 , G06N3/04 , G06T7/13 , G06V10/758
Abstract: The disclosure provides a learning framework that unifies both semantic segmentation and semantic edge detection. A learnable recurrent message passing layer is disclosed where semantic edges are considered as explicitly learned gating signals to refine segmentation and improve dense prediction quality by finding compact structures for message paths. The disclosure includes a method for coupled segmentation and edge learning. In one example, the method includes: (1) receiving an input image, (2) generating, from the input image, a semantic feature map, an affinity map, and a semantic edge map from a single backbone network of a convolutional neural network (CNN), and (3) producing a refined semantic feature map by smoothing pixels of the semantic feature map using spatial propagation, and controlling the smoothing using both affinity values from the affinity map and edge values from the semantic edge map.
-
公开(公告)号:US20250103968A1
公开(公告)日:2025-03-27
申请号:US18821611
申请日:2024-08-30
Applicant: NVIDIA Corporation
Inventor: Zizheng Pan , De-An Huang , Weili Nie , Zhiding Yu , Chaowei Xiao , Anima Anandkumar
IPC: G06N20/20
Abstract: Diffusion models are machine learning algorithms that are uniquely trained to generate high-quality data from an input lower-quality data. Diffusion probabilistic models use discrete-time random processes or continuous-time stochastic differential equations (SDEs) that learn to gradually remove the noise added to the data points. With diffusion probabilistic models, high quality output currently requires sampling from a large diffusion probabilistic model which corners at a high computational cost. The present disclosure stitches together the trajectory of two or more inferior diffusion probabilistic models during a denoising process, which can in turn accelerate the denoising process by avoiding use of only a single large diffusion probabilistic model.
-
公开(公告)号:US20240221166A1
公开(公告)日:2024-07-04
申请号:US18395198
申请日:2023-12-22
Applicant: NVIDIA Corporation
Inventor: Zhiding Yu , Shuaiyi Huang , De-An Huang , Shiyi Lan , Subhashree Radhakrishnan , Jose M. Alvarez Lopez , Anima Anandkumar
IPC: G06T7/12 , G06V10/764 , G06V20/70
CPC classification number: G06T7/12 , G06V10/764 , G06V20/70 , G06T2207/20081
Abstract: Video instance segmentation is a computer vision task that aims to detect, segment, and track objects continuously in videos. It can be used in numerous real-world applications, such as video editing, three-dimensional (3D) reconstruction, 3D navigation (e.g. for autonomous driving and/or robotics), and view point estimation. However, current machine learning-based processes employed for video instance segmentation are lacking, particularly because the densely annotated videos needed for supervised training of high-quality models are not readily available and are not easily generated. To address the issues in the prior art, the present disclosure provides point-level supervision for video instance segmentation in a manner that allows the resulting machine learning model to handle any object category.
-
公开(公告)号:US20230290135A1
公开(公告)日:2023-09-14
申请号:US18119770
申请日:2023-03-09
Applicant: NVIDIA Corporation
Inventor: Daquan Zhou , Zhiding Yu , Enze Xie , Anima Anandkumar , Chaowei Xiao , Jose Manuel Alvarez Lopez
IPC: G06V10/82 , G06V10/77 , G06V10/778 , G06V10/30
CPC classification number: G06V10/82 , G06V10/7715 , G06V10/778 , G06V10/30
Abstract: Apparatuses, systems, and techniques to generate a robust representation of an image. In at least one embodiment, input tokens of an input image are received, and an inference about the input image is generated based on a vision transformer (ViT) system comprising at least one self-attention module to perform token mixing and a channel self-attention module to perform channel processing.
-
-
-
-
-
-
-
-
-