-
公开(公告)号:US10860859B2
公开(公告)日:2020-12-08
申请号:US16202703
申请日:2018-11-28
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz , Behrooz Mahasseni
Abstract: Detection of activity in video content, and more particularly detecting in video start and end frames inclusive of an activity and a classification for the activity, is fundamental for video analytics including categorizing, searching, indexing, segmentation, and retrieval of videos. Existing activity detection processes rely on a large set of features and classifiers that exhaustively run over every time step of a video at multiple temporal scales, or as a small improvement computationally propose segments of the video on which to perform classification. These existing activity detection processes, however, are computationally expensive, particularly when trying to achieve activity detection accuracy, and moreover are not configurable for any particular time or computation budget. The present disclosure provides a time and/or computation budget-aware method for detecting activity in video that relies on a recurrent neural network implementing a learned policy.
-
公开(公告)号:US10783393B2
公开(公告)日:2020-09-22
申请号:US16006709
申请日:2018-06-12
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Stephen Walter Tyree , Jan Kautz , Sina Honari
Abstract: A method, computer readable medium, and system are disclosed for sequential multi-tasking to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, eye gaze tracking, pose estimation, etc. A neural network model processes input image data to generate pixel-level likelihood estimates for landmarks in the input image data and a soft-argmax function computes predicted coordinates of each landmark based on the pixel-level likelihood estimates.
-
公开(公告)号:US20200160593A1
公开(公告)日:2020-05-21
申请号:US16685538
申请日:2019-11-15
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Kihwan Kim , Jan Kautz , Guilin Liu , Soumyadip Sengupta
Abstract: Inverse rendering estimates physical scene attributes (e.g., reflectance, geometry, and lighting) from image(s) and is used for gaming, virtual reality, augmented reality, and robotics. An inverse rendering network (IRN) receives a single input image of a 3D scene and generates the physical scene attributes for the image. The IRN is trained by using the estimated physical scene attributes generated by the IRN to reproduce the input image and updating parameters of the IRN to reduce differences between the reproduced input image and the input image. A direct renderer and a residual appearance renderer (RAR) reproduce the input image. The RAR predicts a residual image representing complex appearance effects of the real (not synthetic) image based on features extracted from the image and the reflectance and geometry properties. The residual image represents near-field illumination, cast shadows, inter-reflections, and realistic shading that are not provided by the direct renderer.
-
公开(公告)号:US10482196B2
公开(公告)日:2019-11-19
申请号:US15055440
申请日:2016-02-26
Applicant: NVIDIA Corporation
Inventor: Benjamin David Eckart , Kihwan Kim , Alejandro Jose Troccoli , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for generating a Gaussian mixture model hierarchy. The method includes the steps of receiving point cloud data defining a plurality of points; defining a Gaussian Mixture Model (GMM) hierarchy that includes a number of mixels, each mixel encoding parameters for a probabilistic occupancy map; and adjusting the parameters for one or more probabilistic occupancy maps based on the point cloud data utilizing a number of iterations of an Expectation-Maximum (EM) algorithm.
-
公开(公告)号:US20190095791A1
公开(公告)日:2019-03-28
申请号:US16134716
申请日:2018-09-18
Applicant: NVIDIA Corporation
Inventor: Sifei Liu , Shalini De Mello , Jinwei Gu , Ming-Hsuan Yang , Jan Kautz
Abstract: A spatial linear propagation network (SLPN) system learns the affinity matrix for vision tasks. An affinity matrix is a generic matrix that defines the similarity of two points in space. The SLPN system is trained for a particular computer vision task and refines an input map (i.e., affinity matrix) that indicates pixels the share a particular property (e.g., color, object, texture, shape, etc.). Inputs to the SLPN system are input data (e.g., pixel values for an image) and the input map corresponding to the input data to be propagated. The input data is processed to produce task-specific affinity values (guidance data). The task-specific affinity values are applied to values in the input map, with at least two weighted values from each column contributing to a value in the refined map data for the adjacent column.
-
公开(公告)号:US10212406B2
公开(公告)日:2019-02-19
申请号:US15381010
申请日:2016-12-15
Applicant: NVIDIA Corporation
Inventor: Orazio Gallo , Jan Kautz , Abhishek Haridas Badki
IPC: H04N5/232 , H04N13/111 , H04N13/128 , H04N13/271
Abstract: A system and method for computational zoom generates a resulting image having two or more effective focal lengths. A first surface within a three-dimensional (3D) scene including a first and second set of 3D objects defined by 3D information is identified. The first and second sets of 3D objects are located within first and second depth ranges of the 3D scene, respectively. The first set of 3D objects is projected onto the first surface according to a first projection mapping to produce a first portion of image components. The second set of 3D objects is projected onto the first surface according to a second projection mapping to produce a second portion of image components. The resulting image comprising the first portion of image components and the second portion of image components is generated based on a camera projection from the first surface to a camera view plane.
-
公开(公告)号:US20180373985A1
公开(公告)日:2018-12-27
申请号:US15880472
申请日:2018-01-25
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
-
168.
公开(公告)号:US10157309B2
公开(公告)日:2018-12-18
申请号:US15402128
申请日:2017-01-09
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Xiaodong Yang , Shalini De Mello , Kihwan Kim , Stephen Walter Tyree , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for detecting and classifying hand gestures. The method includes the steps of receiving an unsegmented stream of data associated with a hand gesture, extracting spatio-temporal features from the unsegmented stream by a three-dimensional convolutional neural network (3DCNN), and producing a class label for the hand gesture based on the spatio-temporal features.
-
公开(公告)号:US20180176532A1
公开(公告)日:2018-06-21
申请号:US15381010
申请日:2016-12-15
Applicant: NVIDIA Corporation
Inventor: Orazio Gallo , Jan Kautz , Abhishek Haridas Badki
CPC classification number: H04N13/111 , H04N5/23216 , H04N5/23296 , H04N13/128 , H04N13/271
Abstract: A system and method for computational zoom generates a resulting image having two or more effective focal lengths. A first surface within a three-dimensional (3D) scene including a first and second set of 3D objects defined by 3D information is identified. The first and second sets of 3D objects are located within first and second depth ranges of the 3D scene, respectively. The first set of 3D objects is projected onto the first surface according to a first projection mapping to produce a first portion of image components. The second set of 3D objects is projected onto the first surface according to a second projection mapping to produce a second portion of image components. The resulting image comprising the first portion of image components and the second portion of image components is generated based on a camera projection from the first surface to a camera view plane.
-
公开(公告)号:US20180114114A1
公开(公告)日:2018-04-26
申请号:US15786406
申请日:2017-10-17
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Stephen Walter Tyree , Tero Tapani Karras , Timo Oskari Aila , Jan Kautz
IPC: G06N3/08
CPC classification number: G06N3/082 , G06N3/0454 , G06N3/084
Abstract: A method, computer readable medium, and system are disclosed for neural network pruning. The method includes the steps of receiving first-order gradients of a cost function relative to layer parameters for a trained neural network and computing a pruning criterion for each layer parameter based on the first-order gradient corresponding to the layer parameter, where the pruning criterion indicates an importance of each neuron that is included in the trained neural network and is associated with the layer parameter. The method includes the additional steps of identifying at least one neuron having a lowest importance and removing the at least one neuron from the trained neural network to produce a pruned neural network.
-
-
-
-
-
-
-
-
-