Learning based camera pose estimation from images of an environment

    公开(公告)号:US10692244B2

    公开(公告)日:2020-06-23

    申请号:US16137064

    申请日:2018-09-20

    Abstract: A deep neural network (DNN) system learns a map representation for estimating a camera position and orientation (pose). The DNN is trained to learn a map representation corresponding to the environment, defining positions and attributes of structures, trees, walls, vehicles, etc. The DNN system learns a map representation that is versatile and performs well for many different environments (indoor, outdoor, natural, synthetic, etc.). The DNN system receives images of an environment captured by a camera (observations) and outputs an estimated camera pose within the environment. The estimated camera pose is used to perform camera localization, i.e., recover the three-dimensional (3D) position and orientation of a moving camera, which is a fundamental task in computer vision with a wide variety of applications in robot navigation, car localization for autonomous driving, device localization for mobile navigation, and augmented/virtual reality.

    3D PLANE DETECTION AND RECONSTRUCTION USING A MONOCULAR IMAGE

    公开(公告)号:US20200167943A1

    公开(公告)日:2020-05-28

    申请号:US16565885

    申请日:2019-09-10

    Abstract: Planar regions in three-dimensional scenes offer important geometric cues in a variety of three-dimensional perception tasks such as scene understanding, scene reconstruction, and robot navigation. Image analysis to detect planar regions can be performed by a deep learning architecture that includes a number of neural networks configured to estimate parameters for the planar regions. The neural networks process an image to detect an arbitrary number of plane objects in the image. Each plane object is associated with a number of estimated parameters including bounding box parameters, plane normal parameters, and a segmentation mask. Global parameters for the image, including a depth map, can also be estimated by one of the neural networks. Then, a segmentation refinement network jointly optimizes (i.e., refines) the segmentation masks for each instance of the plane objects and combines the refined segmentation masks to generate an aggregate segmentation mask for the image.

    System and method for optical flow estimation

    公开(公告)号:US10467763B1

    公开(公告)日:2019-11-05

    申请号:US16537986

    申请日:2019-08-12

    Abstract: A method, computer readable medium, and system are disclosed for estimating optical flow between two images. A first pyramidal set of features is generated for a first image and a partial cost volume for a level of the first pyramidal set of features is computed, by a neural network, using features at the level of the first pyramidal set of features and warped features extracted from a second image, where the partial cost volume is computed across a limited range of pixels that is less than a full resolution of the first image, in pixels, at the level. The neural network processes the features and the partial cost volume to produce a refined optical flow estimate for the first image and the second image.

    THREE-DIMENSIONAL (3D) POSE ESTIMATION FROM A MONOCULAR CAMERA

    公开(公告)号:US20190278983A1

    公开(公告)日:2019-09-12

    申请号:US16290643

    申请日:2019-03-01

    Abstract: Estimating a three-dimensional (3D) pose of an object, such as a hand or body (human, animal, robot, etc.), from a 2D image is necessary for human-computer interaction. A hand pose can be represented by a set of points in 3D space, called keypoints. Two coordinates (x,y) represent spatial displacement and a third coordinate represents a depth of every point with respect to the camera. A monocular camera is used to capture an image of the 3D pose, but does not capture depth information. A neural network architecture is configured to generate a depth value for each keypoint in the captured image, even when portions of the pose are occluded, or the orientation of the object is ambiguous. Generation of the depth values enables estimation of the 3D pose of the object.

    Model-based three-dimensional head pose estimation

    公开(公告)号:US10311589B2

    公开(公告)日:2019-06-04

    申请号:US15823370

    申请日:2017-11-27

    Abstract: One embodiment of the present invention sets forth a technique for estimating a head pose of a user. The technique includes acquiring depth data associated with a head of the user and initializing each particle included in a set of particles with a different candidate head pose. The technique further includes performing one or more optimization passes that include performing at least one iterative closest point (ICP) iteration for each particle and performing at least one particle swarm optimization (PSO) iteration. Each ICP iteration includes rendering the three-dimensional reference model based on the candidate head pose associated with the particle and comparing the three-dimensional reference model to the depth data. Each PSO iteration comprises updating a global best head pose associated with the set of particles and modifying at least one candidate head pose. The technique further includes modifying a shape of the three-dimensional reference model based on depth data.

    DEEP-LEARNING METHOD FOR SEPARATING REFLECTION AND TRANSMISSION IMAGES VISIBLE AT A SEMI-REFLECTIVE SURFACE IN A COMPUTER IMAGE OF A REAL-WORLD SCENE

    公开(公告)号:US20190164268A1

    公开(公告)日:2019-05-30

    申请号:US16200192

    申请日:2018-11-26

    Abstract: When a computer image is generated from a real-world scene having a semi-reflective surface (e.g. window), the computer image will create, at the semi-reflective surface from the viewpoint of the camera, both a reflection of a scene in front of the semi-reflective surface and a transmission of a scene located behind the semi-reflective surface. Similar to a person viewing the real-world scene from different locations, angles, etc., the reflection and transmission may change, and also move relative to each other, as the viewpoint of the camera changes. Unfortunately, the dynamic nature of the reflection and transmission negatively impacts the performance of many computer applications, but performance can generally be improved if the reflection and transmission are separated. The present disclosure uses deep learning to separate reflection and transmission at a semi-reflective surface of a computer image generated from a real-world scene.

    TRAINING A NEURAL NETWORK TO PREDICT SUPERPIXELS USING SEGMENTATION-AWARE AFFINITY LOSS

    公开(公告)号:US20190156154A1

    公开(公告)日:2019-05-23

    申请号:US16188641

    申请日:2018-11-13

    Abstract: Segmentation is the identification of separate objects within an image. An example is identification of a pedestrian passing in front of a car, where the pedestrian is a first object and the car is a second object. Superpixel segmentation is the identification of regions of pixels within an object that have similar properties An example is identification of pixel regions having a similar color, such as different articles of clothing worn by the pedestrian and different components of the car. A pixel affinity neural network (PAN) model is trained to generate pixel affinity maps for superpixel segmentation. The pixel affinity map defines the similarity of two points in space. In an embodiment, the pixel affinity map indicates a horizonal affinity and vertical affinity for each pixel in the image. The pixel affinity map is processed to identify the superpixels.

Patent Agency Ranking