Patent search ap:("NVIDIA Corporation") AND inv:"Huaizu Jiang" Page 1

1.

发明公开
PERFORMING VISUAL RELATIONAL REASONING 审中-公开

公开(公告)号：US20240078423A1

公开(公告)日：2024-03-07

申请号：US17893026

申请日：2022-08-22

Applicant: NVIDIA Corporation

Inventor： Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar

IPC: G06N3/08 , G06F16/55 , G06N3/04

CPC classification number: G06N3/08 , G06F16/55 , G06N3/04

Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.

2.

发明授权
Scene flow estimation using shared features 有权

公开(公告)号：US10986325B2

公开(公告)日：2021-04-20

申请号：US16569104

申请日：2019-09-12

Applicant: NVIDIA Corporation

Inventor： Deqing Sun , Varun Jampani , Erik Gundersen Learned-Miller , Huaizu Jiang

IPC: H04N13/122 , H04N13/128 , G06N3/08 , H04N13/00

Abstract: Scene flow represents the three-dimensional (3D) structure and movement of objects in a video sequence in three dimensions from frame-to-frame and is used to track objects and estimate speeds for autonomous driving applications. Scene flow is recovered by a neural network system from a video sequence captured from at least two viewpoints (e.g., cameras), such as a left-eye and right-eye of a viewer. An encoder portion of the system extracts features from frames of the video sequence. The features are input to a first decoder to predict optical flow and a second decoder to predict disparity. The optical flow represents pixel movement in (x,y) and the disparity represents pixel movement in z (depth). When combined, the optical flow and disparity represent the scene flow.

3.

发明授权
Multi-frame video interpolation using optical flow 有权

公开(公告)号：US10776688B2

公开(公告)日：2020-09-15

申请号：US16169851

申请日：2018-10-24

Applicant: NVIDIA Corporation

Inventor： Huaizu Jiang , Deqing Sun , Varun Jampani

IPC: G06N3/04 , G06N3/08 , H04N7/01 , G06T7/246

Abstract: Video interpolation is used to predict one or more intermediate frames at timesteps defined between two consecutive frames. A first neural network model approximates optical flow data defining motion between the two consecutive frames. A second neural network model refines the optical flow data and predicts visibility maps for each timestep. The two consecutive frames are warped according to the refined optical flow data for each timestep to produce pairs of warped frames for each timestep. The second neural network model then fuses the pair of warped frames based on the visibility maps to produce the intermediate frame for each timestep. Artifacts caused by motion boundaries and occlusions are reduced in the predicted intermediate frames.

4.

发明申请
MULTI-FRAME VIDEO INTERPOLATION USING OPTICAL FLOW 审中-公开

公开(公告)号：US20190138889A1

公开(公告)日：2019-05-09

申请号：US16169851

申请日：2018-10-24

Applicant: NVIDIA Corporation

Inventor： Huaizu Jiang , Deqing Sun , Varun Jampani

IPC: G06N3/04 , G06N3/08 , G06T7/246 , H04N7/01

Abstract: Video interpolation is used to predict one or more intermediate frames at timesteps defined between two consecutive frames. A first neural network model approximates optical flow data defining motion between the two consecutive frames. A second neural network model refines the optical flow data and predicts visibility maps for each timestep. The two consecutive frames are warped according to the refined optical flow data for each timestep to produce pairs of warped frames for each timestep. The second neural network model then fuses the pair of warped frames based on the visibility maps to produce the intermediate frame for each timestep. Artifacts caused by motion boundaries and occlusions are reduced in the predicted intermediate frames.

5.

发明公开
PERFORMING VISUAL RELATIONAL REASONING 审中-公开

公开(公告)号：US20240062534A1

公开(公告)日：2024-02-22

申请号：US17893038

申请日：2022-08-22

Applicant: NVIDIA Corporation

Inventor： Xiaojian Ma , Weili Nie , Zhiding Yu , Huaizu Jiang , Chaowei Xiao , Yuke Zhu , Anima Anandkumar

IPC: G06V10/82 , G06V10/20 , G06V10/94

CPC classification number: G06V10/82 , G06V10/255 , G06V10/94

Abstract: A vision transformer (ViT) is a deep learning model that performs one or more vision processing tasks. ViTs may be modified to include a global task that clusters images with the same concept together to produce semantically consistent relational representations, as well as a local task that guides the ViT to discover object-centric semantic correspondence across images. A database of concepts and associated features may be created and used to train the global and local tasks, which may then enable the ViT to perform visual relational reasoning faster, without supervision, and outside of a synthetic domain.

6.

发明申请
SCENE FLOW ESTIMATION USING SHARED FEATURES 审中-公开

公开(公告)号：US20200084427A1

公开(公告)日：2020-03-12

申请号：US16569104

申请日：2019-09-12

Applicant: NVIDIA Corporation

Inventor： Deqing Sun , Varun Jampani , Erik Gundersen Learned-Miller , Huaizu Jiang

IPC: H04N13/122 , H04N13/128 , G06N3/08

Abstract: Scene flow represents the three-dimensional (3D) structure and movement of objects in a video sequence in three dimensions from frame-to-frame and is used to track objects and estimate speeds for autonomous driving applications. Scene flow is recovered by a neural network system from a video sequence captured from at least two viewpoints (e.g., cameras), such as a left-eye and right-eye of a viewer. An encoder portion of the system extracts features from frames of the video sequence. The features are input to a first decoder to predict optical flow and a second decoder to predict disparity. The optical flow represents pixel movement in (x,y) and the disparity represents pixel movement in z (depth). When combined, the optical flow and disparity represent the scene flow.

Patent Agency Ranking