-
公开(公告)号:US11295514B2
公开(公告)日:2022-04-05
申请号:US16685538
申请日:2019-11-15
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Kihwan Kim , Jan Kautz , Guilin Liu , Soumyadip Sengupta
Abstract: Inverse rendering estimates physical scene attributes (e.g., reflectance, geometry, and lighting) from image(s) and is used for gaming, virtual reality, augmented reality, and robotics. An inverse rendering network (IRN) receives a single input image of a 3D scene and generates the physical scene attributes for the image. The IRN is trained by using the estimated physical scene attributes generated by the IRN to reproduce the input image and updating parameters of the IRN to reduce differences between the reproduced input image and the input image. A direct renderer and a residual appearance renderer (RAR) reproduce the input image. The RAR predicts a residual image representing complex appearance effects of the real (not synthetic) image based on features extracted from the image and the reflectance and geometry properties. The residual image represents near-field illumination, cast shadows, inter-reflections, and realistic shading that are not provided by the direct renderer.
-
公开(公告)号:US20210326694A1
公开(公告)日:2021-10-21
申请号:US16852944
申请日:2020-04-20
Applicant: Nvidia Corporation
Inventor: Jialiang Wang , Varun Jampani , Stan Birchfield , Charles Loop , Jan Kautz
Abstract: Apparatuses, systems, and techniques are presented to determine distance for one or more objects. In at least one embodiment, a disparity network is trained to determine distance data from input stereoscopic images using a loss function that includes at least one of a gradient loss term and an occlusion loss term.
-
公开(公告)号:US11049018B2
公开(公告)日:2021-06-29
申请号:US15880472
申请日:2018-01-25
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Pavlo Molchanov , Jan Kautz
Abstract: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.
-
公开(公告)号:US11017556B2
公开(公告)日:2021-05-25
申请号:US16152303
申请日:2018-10-04
Applicant: NVIDIA Corporation
Inventor: Xiaodong Yang , Xitong Yang , Fanyi Xiao , Ming-Yu Liu , Jan Kautz
Abstract: Iterative prediction systems and methods for the task of action detection process an inputted sequence of video frames to generate an output of both action tubes and respective action labels, wherein the action tubes comprise a sequence of bounding boxes on each video frame. An iterative predictor processes large offsets between the bounding boxes and the ground-truth.
-
公开(公告)号:US20210150736A1
公开(公告)日:2021-05-20
申请号:US17156406
申请日:2021-01-22
Applicant: NVIDIA Corporation
Inventor: Zhaoyang Lv , Kihwan Kim , Deqing Sun , Alejandro Jose Troccoli , Jan Kautz
IPC: G06T7/254 , G06T7/90 , G06T7/50 , G06N3/08 , G06T7/194 , G06T3/00 , G06T7/70 , G06T7/60 , G06T7/11 , G06N5/04 , G06T7/285 , G06T7/215
Abstract: A neural network model receives color data for a sequence of images corresponding to a dynamic scene in three-dimensional (3D) space. Motion of objects in the image sequence results from a combination of a dynamic camera orientation and motion or a change in the shape of an object in the 3D space. The neural network model generates two components that are used to produce a 3D motion field representing the dynamic (non-rigid) part of the scene. The two components are information identifying dynamic and static portions of each image and the camera orientation. The dynamic portions of each image contain motion in the 3D space that is independent of the camera orientation. In other words, the motion in the 3D space (estimated 3D scene flow data) is separated from the motion of the camera.
-
公开(公告)号:US20210142177A1
公开(公告)日:2021-05-13
申请号:US16682967
申请日:2019-11-13
Applicant: Nvidia Corporation
Inventor: Arun Mallya , Jan Kautz , Zhizhong Li , Pavlo Molchanov , Hongxu Danny Yin
Abstract: Apparatuses, systems, and techniques are presented to generate data useful for further training of a neural network. In at least one embodiment, one or more neural networks can be re-trained based, at least in part, on data generated by the one or more neural networks including data used to previously train the one or more neural networks.
-
公开(公告)号:US20210117661A1
公开(公告)日:2021-04-22
申请号:US17135697
申请日:2020-12-28
Applicant: NVIDIA Corporation
Inventor: Umar Iqbal , Pavlo Molchanov , Thomas Michael Breuel , Jan Kautz
Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.
-
公开(公告)号:US10964061B2
公开(公告)日:2021-03-30
申请号:US16872752
申请日:2020-05-12
Applicant: NVIDIA Corporation
Inventor: Jinwei Gu , Samarth Manoj Brahmbhatt , Kihwan Kim , Jan Kautz
Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.
-
19.
公开(公告)号:US20210089867A1
公开(公告)日:2021-03-25
申请号:US16581099
申请日:2019-09-24
Applicant: NVIDIA Corporation
Inventor: Wonmin Byeon , Jan Kautz
Abstract: Learning the dynamics of an environment and predicting consequences in the future is a recent technical advancement that can be applied to video prediction, speech recognition, among other applications. Generally, machine learning, such as deep learning models, neural networks, or other artificial intelligence algorithms are used to make the predictions. However, current artificial intelligence algorithms used for making predictions are typically limited to making short-term future predictions, mainly as a result of 1) the presence of complex dynamics in high-dimensional video data, 2) prediction error propagation over time, and 3) inherent uncertainty of the future. The present disclosure enables the modeling of long-term dependencies in sequential data for use in making long-term predictions by providing a dual (i.e. two-part) recurrent neural network architecture.
-
公开(公告)号:US10783394B2
公开(公告)日:2020-09-22
申请号:US16006728
申请日:2018-06-12
Applicant: NVIDIA Corporation
Inventor: Pavlo Molchanov , Stephen Walter Tyree , Jan Kautz , Sina Honari
Abstract: A method, computer readable medium, and system are disclosed to generate coordinates of landmarks within images. The landmark locations may be identified on an image of a human face and used for emotion recognition, face identity verification, eye gaze tracking, pose estimation, etc. A transform is applied to input image data to produce transformed input image data. The transform is also applied to predicted coordinates for landmarks of the input image data to produce transformed predicted coordinates. A neural network model processes the transformed input image data to generate additional landmarks of the transformed input image data and additional predicted coordinates for each one of the additional landmarks. Parameters of the neural network model are updated to reduce differences between the transformed predicted coordinates and the additional predicted coordinates.
-
-
-
-
-
-
-
-
-