-
公开(公告)号:US11049018B2
公开(公告)日:2021-06-29
申请号:US15880472
申请日:2018-01-25
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
-
公开(公告)号:US11017556B2
公开(公告)日:2021-05-25
申请号:US16152303
申请日:2018-10-04
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Xitong Yang , Fanyi Xiao , Ming-Yu Liu , Jan Kautz
Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
-
公开(公告)号:US10595039B2
公开(公告)日:2020-03-17
申请号:US15939098
申请日:2018-03-28
Applicant: NVIDIA Corporation
Inventor: Ming-Yu Liu , Xiaodong Yang , Jan Kautz , Sergey Tulyakov
IPC: H04N19/513 , G06K9/00 , G06N3/08 , G06T13/40 , G06N3/04
Abstract: A method, computer readable medium, and system are disclosed for action video generation. The method includes the steps of generating, by a recurrent neural network, a sequence of motion vectors from a first set of random variables and receiving, by a generator neural network, the sequence of motion vectors and a content vector sample. The sequence of motion vectors and the content vector sample are sampled by the generator neural network to produce a video clip.
-
公开(公告)号:US20190163978A1
公开(公告)日:2019-05-30
申请号:US16202703
申请日:2018-11-28
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz , Behrooz Mahasseni
Abstract: Detection of activity in video content, and more particularly detecting in video start and end frames inclusive of an activity and a classification for the activity, is fundamental for video analytics including categorizing, searching, indexing, segmentation, and retrieval of videos. Existing activity detection processes rely on a large set of features and classifiers that exhaustively run over every time step of a video at multiple temporal scales, or as a small improvement computationally propose segments of the video on which to perform classification. These existing activity detection processes, however, are computationally expensive, particularly when trying to achieve activity detection accuracy, and moreover are not configurable for any particular time or computation budget. The present disclosure provides a time and/or computation budget-aware method for detecting activity in video that relies on a recurrent neural network implementing a learned policy.
-
公开(公告)号:US11948078B2
公开(公告)日:2024-04-02
申请号:US17000048
申请日:2020-08-21
Applicant: Nvidia Corporation
Inventor: Arash Vahdat , Tanmay Gupta , Xiaodong Yang , Jan Kautz
IPC: G06N3/08 , G06F18/214 , G06F18/22 , G06V10/74 , G06V10/82 , G06V30/19 , G06V30/262
CPC classification number: G06N3/08 , G06F18/2148 , G06F18/22 , G06V10/761 , G06V10/82 , G06V30/1916 , G06V30/19173 , G06V30/274
Abstract: The disclosure provides a framework or system for learning visual representation using a large set of image/text pairs. The disclosure provides, for example, a method of visual representation learning, a joint representation learning system, and an artificial intelligence (AI) system that employs one or more of the trained models from the method or system. The AI system can be used, for example, in autonomous or semi-autonomous vehicles. In one example, the method of visual representation learning includes: (1) receiving a set of image embeddings from an image representation model and a set of text embeddings from a text representation model, and (2) training, employing mutual information, a critic function by learning relationships between the set of image embeddings and the set of text embeddings.
-
公开(公告)号:US11367268B2
公开(公告)日:2022-06-21
申请号:US16998890
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Yang Zou , Zhiding Yu , Jan Kautz
Abstract: Object re-identification refers to a process by which images that contain an object of interest are retrieved from a set of images captured using disparate cameras or in disparate environments. Object re-identification has many useful applications, particularly as it is applied to people (e.g. person tracking). Current re-identification processes rely on convolutional neural networks (CNNs) that learn re-identification for a particular object class from labeled training data specific to a certain domain (e.g. environment), but that do not apply well in other domains. The present disclosure provides cross-domain disentanglement of id-related and id-unrelated factors. In particular, the disentanglement is performed using a labeled image set and an unlabeled image set, respectively captured from different domains but for a same object class. The identification-related features may then be used to train a neural network to perform re-identification of objects in that object class from images captured from the second domain.
-
公开(公告)号:US10373332B2
公开(公告)日:2019-08-06
申请号:US15836549
申请日:2017-12-08
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Xiaodong Yang , Shalini De Mello , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for dynamic facial analysis. The method includes the steps of receiving video data representing a sequence of image frames including at least one head and extracting, by a neural network, spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data. The method also includes the step of processing, by a recurrent neural network, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.
-
公开(公告)号:US20180293737A1
公开(公告)日:2018-10-11
申请号:US15942213
申请日:2018-03-30
Applicant: NVIDIA Corporation
Inventor: Deqing Sun , Xiaodong Yang , Ming-Yu Liu , Jan Kautz
CPC classification number: G06T7/207 , G06N3/0454 , G06N3/08 , G06N5/046 , G06T3/0093 , G06T7/246 , G06T7/251 , G06T7/97 , G06T2200/28 , G06T2207/10016 , G06T2207/20016 , G06T2207/20032 , G06T2207/20084
Abstract: A method, computer readable medium, and system are disclosed for estimating optical flow between two images. A first pyramidal set of features is generated for a first image and a partial cost volume for a level of the first pyramidal set of features is computed, by a neural network, using features at the level of the first pyramidal set of features and warped features extracted from a second image, where the partial cost volume is computed across a limited range of pixels that is less than a full resolution of the first image, in pixels, at the level. The neural network processes the features and the partial cost volume to produce a refined optical flow estimate for the first image and the second image.
-
公开(公告)号:US11594006B2
公开(公告)日:2023-02-28
申请号:US16998914
申请日:2020-08-20
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Xitong Yang , Sifei Liu , Jan Kautz
Abstract: There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.
-
公开(公告)号:US20210271977A1
公开(公告)日:2021-09-02
申请号:US17325024
申请日:2021-05-19
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
-
-
-
-
-
-
-
-
-