-
公开(公告)号:US20240087222A1
公开(公告)日:2024-03-14
申请号:US18515016
申请日:2023-11-20
Applicant: NVIDIA Corporation
Inventor: Yiming Li , Zhiding Yu , Christopher B. Choy , Chaowei Xiao , Jose Manuel Alvarez Lopez , Sanja Fidler , Animashree Anandkumar
Abstract: An artificial intelligence framework is described that incorporates a number of neural networks and a number of transformers for converting a two-dimensional image into three-dimensional semantic information. Neural networks convert one or more images into a set of image feature maps, depth information associated with the one or more images, and query proposals based on the depth information. A first transformer implements a cross-attention mechanism to process the set of image feature maps in accordance with the query proposals. The output of the first transformer is combined with a mask token to generate initial voxel features of the scene. A second transformer implements a self-attention mechanism to convert the initial voxel features into refined voxel features, which are up-sampled and processed by a lightweight neural network to generate the three-dimensional semantic information, which may be used by, e.g., an autonomous vehicle for various advanced driver assistance system (ADAS) functions.