-
公开(公告)号:US20240096115A1
公开(公告)日:2024-03-21
申请号:US18243555
申请日:2023-09-07
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Jan Kautz , Arash Vahdat , Hongxu Yin , Paul Micaelli
CPC classification number: G06V20/597 , G06T7/70 , G06V10/82 , G06V20/70 , G06V40/171 , G06T2207/30201 , G06V2201/07
Abstract: Landmark detection refers to the detection of landmarks within an image or a video, and is used in many computer vision tasks such emotion recognition, face identity verification, hand tracking, gesture recognition, and eye gaze tracking. Current landmark detection methods rely on a cascaded computation through cascaded networks or an ensemble of multiple models, which starts with an initial guess of the landmarks and iteratively produces corrected landmarks which match the input more finely. However, the iterations required by current methods typically increase the training memory cost linearly, and do not have an obvious stopping criteria. Moreover, these methods tend to exhibit jitter in landmark detection results for video. The present disclosure improves current landmark detection methods by providing landmark detection using an iterative neural network. Furthermore, when detecting landmarks in video, the present disclosure provides for a reduction in jitter due to reuse of previous hidden states from previous frames.
-
32.
公开(公告)号:US20230078171A1
公开(公告)日:2023-03-16
申请号:US18051296
申请日:2022-10-31
Applicant: NVIDIA Corporation
Inventor: Nuri Murat Arar , Niranjan Avadhanam , Nishant Puri , Shagan Sah , Rajath Shetty , Sujay Yadawadkar , Pavlo Molchanov
Abstract: Systems and methods for more accurate and robust determination of subject characteristics from an image of the subject. One or more machine learning models receive as input an image of a subject, and output both facial landmarks and associated confidence values. Confidence values represent the degrees to which portions of the subject's face corresponding to those landmarks are occluded, i.e., the amount of uncertainty in the position of each landmark location. These landmark points and their associated confidence values, and/or associated information, may then be input to another set of one or more machine learning models which may output any facial analysis quantity or quantities, such as the subject's gaze direction, head pose, drowsiness state, cognitive load, or distraction state.
-
公开(公告)号:US20220284283A1
公开(公告)日:2022-09-08
申请号:US17195451
申请日:2021-03-08
Applicant: NVIDIA Corporation
Inventor: Hongxu Yin , Pavlo Molchanov , Jose Manuel Alvarez Lopez , Xin Dong
Abstract: Apparatuses, systems, and techniques to invert a neural network. In at least one embodiment, one or more neural network layers are inverted and, in at least one embodiment, loaded in reverse order.
-
公开(公告)号:US20220284232A1
公开(公告)日:2022-09-08
申请号:US17188397
申请日:2021-03-01
Applicant: NVIDIA Corporation
Inventor: Hongxu Yin , Arun Mallya , Arash Vahdat , Jose Manuel Alvarez Lopez , Jan Kautz , Pavlo Molchanov
Abstract: Apparatuses, systems, and techniques to identify one or more images used to train one or more neural networks. In at least one embodiment, one or more images used to train one or more neural networks are identified, based on, for example, one or more labels of one or more objects within the one or more images.
-
公开(公告)号:US20220254029A1
公开(公告)日:2022-08-11
申请号:US17500338
申请日:2021-10-13
Applicant: NVIDIA Corporation
Inventor: Eugene Vorontsov , Wonmin Byeon , Shalini De Mello , Varun Jampani , Ming-Yu Liu , Pavlo Molchanov
Abstract: The neural network includes an encoder, a common decoder, and a residual decoder. The encoder encodes input images into a latent space. The latent space disentangles unique features from other common features. The common decoder decodes common features resident in the latent space to generate translated images which lack the unique features. The residual decoder decodes unique features resident in the latent space to generate image deltas corresponding to the unique features. The neural network combines the translated images with the image deltas to generate combined images that may include both common features and unique features. The combined images can be used to drive autoencoding. Once training is complete, the residual decoder can be modified to generate segmentation masks that indicate any regions of a given input image where a unique feature resides.
-
公开(公告)号:US11315018B2
公开(公告)日:2022-04-26
申请号:US15786406
申请日:2017-10-17
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Stephen Walter Tyree , Tero Tapani Karras , Timo Oskari Aila , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for neural network pruning. The method includes the steps of receiving first-order gradients of a cost function relative to layer parameters for a trained neural network and computing a pruning criterion for each layer parameter based on the first-order gradient corresponding to the layer parameter, where the pruning criterion indicates an importance of each neuron that is included in the trained neural network and is associated with the layer parameter. The method includes the additional steps of identifying at least one neuron having a lowest importance and removing the at least one neuron from the trained neural network to produce a pruned neural network.
-
公开(公告)号:US20200320401A1
公开(公告)日:2020-10-08
申请号:US16378464
申请日:2019-04-08
Applicant: NVIDIA Corporation
Inventor: Varun Jampani , Wei-Chih Hung , Sifei Liu , Pavlo Molchanov , Jan Kautz
Abstract: Systems and methods to detect one or more segments of one or more objects within one or more images based, at least in part, on a neural network trained in an unsupervised manner to infer the one or more segments. Systems and methods to help train one or more neural networks to detect one or more segments of one or more objects within one or more images in an unsupervised manner.
-
公开(公告)号:US20190278983A1
公开(公告)日:2019-09-12
申请号:US16290643
申请日:2019-03-01
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Thomas Michael Breuel , Jan Kautz
Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
-
公开(公告)号:US10402697B2
公开(公告)日:2019-09-03
申请号:US15660719
申请日:2017-07-26
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for classifying video image data. The method includes the steps of processing training video image data by at least a first layer of a convolutional neural network (CNN) to extract a first set of feature maps and generate classification output data for the training video image data. Spatial classification accuracy data is computed based on the classification output data and target classification output data and spatial discrimination factors for the first layer are computed based on the spatial classification accuracies and the first set of feature maps.
-
40.
公开(公告)号:US20170206405A1
公开(公告)日:2017-07-20
申请号:US15402128
申请日:2017-01-09
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Xiaodong Yang , Shalini De Mello , Kihwan Kim , Stephen Walter Tyree , Jan Kautz
CPC classification number: G06K9/00355 , G06K9/00201 , G06K9/00765 , G06K9/4628 , G06K9/4652 , G06K9/6251 , G06K9/6256 , G06K9/627 , G06K9/6277 , G06N3/0445 , G06N3/0454 , G06N3/084 , Y04S10/54
Abstract: A method, computer readable medium, and system are disclosed for detecting and classifying hand gestures. The method includes the steps of receiving an unsegmented stream of data associated with a hand gesture, extracting spatio-temporal features from the unsegmented stream by a three-dimensional convolutional neural network (3DCNN), and producing a class label for the hand gesture based on the spatio-temporal features.
-
-
-
-
-
-
-
-
-