-
公开(公告)号:US20240169636A1
公开(公告)日:2024-05-23
申请号:US18317378
申请日:2023-05-15
申请人: NVIDIA Corporation
发明人: Ye Yuan , Jiaming Song , Umar Iqbal , Arash Vahdat , Jan Kautz
CPC分类号: G06T13/40 , G06T5/002 , G06T13/80 , G06T2207/20081 , G06T2207/20084
摘要: Systems and methods are disclosed that improve performance of synthesized motion generated by a diffusion neural network model. A physics-guided motion diffusion model incorporates physical constraints into the diffusion process to model the complex dynamics induced by forces and contact. Specifically, a physics-based motion projection module uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically plausible motion. The projected motion is further used in the next diffusion iteration to guide the denoising diffusion process. The use of physical constraints in the physics-guided motion diffusion model iteratively pulls the motion toward a physically-plausible space, reducing artifacts such as floating, foot sliding, and ground penetration.
-
公开(公告)号:US20240153093A1
公开(公告)日:2024-05-09
申请号:US18310414
申请日:2023-05-01
申请人: NVIDIA Corporation
发明人: Jiarui Xu , Shalini De Mello , Sifei Liu , Arash Vahdat , Wonmin Byeon
CPC分类号: G06T7/10 , G06V10/40 , G06T2207/20081 , G06T2207/20084
摘要: An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.
-
公开(公告)号:US20240096115A1
公开(公告)日:2024-03-21
申请号:US18243555
申请日:2023-09-07
申请人: NVIDIA Corporation
发明人: Pavlo Molchanov , Jan Kautz , Arash Vahdat , Hongxu Yin , Paul Micaelli
CPC分类号: G06V20/597 , G06T7/70 , G06V10/82 , G06V20/70 , G06V40/171 , G06T2207/30201 , G06V2201/07
摘要: Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network. Furthermore, when detecting landmarks in video, the present disclosure provides for a reduction in jitter due to reuse of previous hidden states from previous frames.
-
公开(公告)号:US20220284232A1
公开(公告)日:2022-09-08
申请号:US17188397
申请日:2021-03-01
申请人: NVIDIA Corporation
发明人: Hongxu Yin , Arun Mallya , Arash Vahdat , Jose Manuel Alvarez Lopez , Jan Kautz , Pavlo Molchanov
摘要: Apparatuses, systems, and techniques to identify one or more images used to train one or more neural networks. In at least one embodiment, one or more images used to train one or more neural networks are identified, based on, for example, one or more labels of one or more objects within the one or more images.
-
公开(公告)号:US20210056353A1
公开(公告)日:2021-02-25
申请号:US17000048
申请日:2020-08-21
申请人: Nvidia Corporation
发明人: Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz
摘要: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
-
公开(公告)号:US20240144000A1
公开(公告)日:2024-05-02
申请号:US18307227
申请日:2023-04-26
申请人: NVIDIA Corporation
发明人: Yuji Roh , Weili Nie , De-An Huang , Arash Vahdat , Animashree Anandkumar
IPC分类号: G06N3/08
CPC分类号: G06N3/08
摘要: A neural network model is trained for fairness and accuracy using both real and synthesized training data, such as images. During training a first sampling ratio between the real and synthesized training data is optimized. The first sampling ratio may comprise a value for each group (or attribute), where each value is optimized. A second sampling ratio defines relative amounts of training data that are used for each one of the groups. Furthermore, a neural network model accuracy and a fairness metric are both used for updating the first and second sampling ratios during training iterations. The neural network model may be trained using different classes of training data. The second sampling ratio may vary for each class.
-
公开(公告)号:US11948078B2
公开(公告)日:2024-04-02
申请号:US17000048
申请日:2020-08-21
申请人: Nvidia Corporation
发明人: Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz
IPC分类号: G06N3/08 , G06F18/214 , G06F18/22 , G06V10/74 , G06V10/82 , G06V30/19 , G06V30/262
CPC分类号: G06N3/08 , G06F18/2148 , G06F18/22 , G06V10/761 , G06V10/82 , G06V30/1916 , G06V30/19173 , G06V30/274
摘要: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
-
公开(公告)号:US20230351807A1
公开(公告)日:2023-11-02
申请号:US17661706
申请日:2022-05-02
申请人: NVIDIA Corporation
发明人: Yuzhuo Ren , Weili Nie , Arash Vahdat , Animashree Anandkumar , Nishant Puri , Niranjan Avadhanam
IPC分类号: G06V40/16 , G06V10/82 , G06V10/774 , G06V10/62
CPC分类号: G06V40/176 , G06V10/82 , G06V10/774 , G06V10/62 , G06V40/164
摘要: A machine learning model (MLM) may be trained and evaluated. Attribute-based performance metrics may be analyzed to identify attributes for which the MLM is performing below a threshold when each are present in a sample. A generative neural network (GNN) may be used to generate samples including compositions of the attributes, and the samples may be used to augment the data used to train the MLM. This may be repeated until one or more criteria are satisfied. In various examples, a temporal sequence of data items, such as frames of a video, may be generated which may form samples of the data set. Sets of attribute values may be determined based on one or more temporal scenarios to be represented in the data set, and one or more GNNs may be used to generate the sequence to depict information corresponding to the attribute values.
-
公开(公告)号:US20230095092A1
公开(公告)日:2023-03-30
申请号:US17957143
申请日:2022-09-30
申请人: Nvidia Corporation
发明人: Zhisheng Xiao , Karsten Kreis , Arash Vahdat
IPC分类号: G06T5/00
摘要: Apparatuses, systems, and techniques are presented to train and utilize one or more neural networks. A denoising diffusion generative adversarial network (denoising diffusion GAN) reduces a number of denoising steps during a reverse process. The denoising diffusion GAN does not assume a Gaussian distribution for large steps of the denoising process and applies a multi-model model to permit denoising with fewer steps. Systems and methods further minimize a divergence between a diffused real data distribution and a diffused generator distribution over several timesteps. Accordingly, various embodiments may enable faster sample generation, in which the samples are generated from noise using the denoising diffusion GAN.
-
公开(公告)号:US20240273682A1
公开(公告)日:2024-08-15
申请号:US18431527
申请日:2024-02-02
申请人: NVIDIA Corporation
发明人: Weili Nie , Guan-Horng Liu , Arash Vahdat , De-An Huang , Anima Anandkumar
摘要: Image restoration generally involves recovering a target clean image from a given image having noise, blurring, or other degraded features. Current image restoration solutions typically include a diffusion model that is trained for image restoration by a forward process that progressively diffuses data to noise, and then by learning in a reverse process to generate the data from the noise. However, the forward process relies on Gaussian noise to diffuse the original data, which has little or no structural information corresponding to the original data versus learning from the degraded image itself which is much more structurally informative compared to the random Gaussian noise. Similar problems also exist for other data-to-data translation tasks. The present disclosure trains a data translation conditional diffusion model from diffusion bridge(s) computed between a first version of the data and a second version of the data, which can yield a model that can provide interpretable generation, sampling efficiency, and reduced processing time.
-
-
-
-
-
-
-
-
-