-
公开(公告)号:US20230081641A1
公开(公告)日:2023-03-16
申请号:US17551046
申请日:2021-12-14
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Eric Ryan Chan , Sameh Khamis , Shalini De Mello , Tero Tapani Karras , Orazio Gallo , Jonathan Tremblay
Abstract: A single two-dimensional (2D) image can be used as input to obtain a three-dimensional (3D) representation of the 2D image. This is done by extracting features from the 2D image by an encoder and determining a 3D representation of the 2D image utilizing a trained 2D convolutional neural network (CNN). Volumetric rendering is then run on the 3D representation to combine features within one or more viewing directions, and the combined features are provided as input to a multilayer perceptron (MLP) that predicts and outputs color (or multi-dimensional neural features) and density values for each point within the 3D representation. As a result, single-image inverse rendering may be performed using only a single 2D image as input to create a corresponding 3D representation of the scene in the single 2D image.
-
公开(公告)号:US20240104842A1
公开(公告)日:2024-03-28
申请号:US18472653
申请日:2023-09-22
Applicant: NVIDIA Corporation
Inventor: Koki Nagano , Alexander Trevithick , Chao Liu , Eric Ryan Chan , Sameh Khamis , Michael Stengel , Zhiding Yu
IPC: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771
CPC classification number: G06T17/00 , G06T5/20 , G06T7/70 , G06T7/90 , G06V10/771 , G06T2207/10024
Abstract: A method for generating, by an encoder-based model, a three-dimensional (3D) representation of a two-dimensional (2D) image is provided. The encoder-based model is trained to infer the 3D representation using a synthetic training data set generated by a pre-trained model. The pre-trained model is a 3D generative model that produces a 3D representation and a corresponding 2D rendering, which can be used to train a separate encoder-based model for downstream tasks like estimating a triplane representation, neural radiance field, mesh, depth map, 3D key points, or the like, given a single input image, using the pseudo ground truth 3D synthetic training data set. In a particular embodiment, the encoder-based model is trained to predict a triplane representation of the input image, which can then be rendered by a volume renderer according to pose information to generate an output image of the 3D scene from the corresponding viewpoint.
-