Cross-modal sensor data alignment
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining an alignment between cross-modal sensor data. In one aspect, a method comprises: obtaining (i) an image that characterizes a visual appearance of an environment, and (ii) a point cloud comprising a collection of data points that characterizes a three-dimensional geometry of the environment; processing each of a plurality of regions of the image using a visual embedding neural network to generate a respective embedding of each of the image regions; processing each of a plurality of regions of the point cloud using a shape embedding neural network to generate a respective embedding of each of the point cloud regions; and identifying a plurality of region pairs using the embeddings of the image regions and the embeddings of the point cloud regions.
Public/Granted literature
Information query
Patent Agency Ranking
0/0