Abstract:
Techniques are presented for monocular visual simultaneous localization and mapping (SLAM) based on detecting a translational motion in the movement of the camera using at least one motion sensor, while the camera is performing panoramic SLAM, and initializing a three dimensional map for tracking of finite features. Motion sensors may include one or more sensors, including inertial (gyroscope, accelerometer), magnetic (compass), vision (camera) or any other sensors built into mobile devices.
Abstract:
Exemplary methods, apparatuses, and systems for performing wide area localization from simultaneous localization and mapping (SLAM) maps are disclosed. A mobile device can select a first keyframe based SLAM map of the local environment with one or more received images. A respective localization of the mobile device within the local environment can be determined, and the respective localization may be based on the keyframe based SLAM map. The mobile device can send the first keyframe to a server and receive a first global localization response representing a correction to a local map on the mobile device. The first global localization response can include rotation, translation, and scale information. A server can receive keyframes from a mobile device, and localize the keyframes within a server map by matching keyframe features received from the mobile device to server map features.
Abstract:
A method determines a pose of an image capture device. The method includes accessing an image of a scene captured by the image capture device. A semantic segmentation of the image is performed, to generate a segmented image. An initial pose of the image capture device is generated using a three-dimensional (3D) tracker. A plurality of 3D renderings of the scene are generated, each of the plurality of 3D renderings corresponding to one of a plurality of poses chosen based on the initial pose. A pose is selected from the plurality of poses, such that the 3D rendering corresponding to the selected pose aligns with the segmented image.
Abstract:
Disclosed are a system, apparatus, and method for multiple client simultaneous localization and mapping. Tracking and mapping may be performed locally and independently by each of a plurality of clients. At configurable points in time map data may be sent to a server for stitching and fusion. In response to successful stitching and fusion to one or more maps known to the server, updated position and orientation information relative to the server's maps may be sent back to the clients. Clients may update their local map data with the received server location data. Clients may receive additional map data from the server, which can be used for extending their maps. Clients may send queries to the server for 3D maps, and the queries may include metadata.
Abstract:
A method determines a pose of an image capture device. The method includes accessing an image of a scene captured by the image capture device. A semantic segmentation of the image is performed, to generate a segmented image. An initial pose of the image capture device is generated using a three-dimensional (3D) tracker. A plurality of 3D renderings of the scene are generated, each of the plurality of 3D renderings corresponding to one of a plurality of poses chosen based on the initial pose. A pose is selected from the plurality of poses, such that the 3D rendering corresponding to the selected pose aligns with the segmented image.
Abstract:
A computer-implemented method, apparatus, computer readable medium and mobile device for determining a 6DOF pose from an input image. The process of determining 6DOF pose may include processing an input image to create one or more static representations of the input image, creating a dynamic representation of the input image from an estimated 6DOF pose and a 2.5D reference map, and measuring correlation between the dynamic representation and the one or more static representations of the input image. The estimated 6DOF may be iteratively adjusted according to the measured correlation error until a final adjusted dynamic representation meets an output threshold.
Abstract:
Disclosed are a system, apparatus, and method for multiple client simultaneous localization and mapping. Tracking and mapping may be performed locally and independently by each of a plurality of clients. At configurable points in time map data may be sent to a server for stitching and fusion. In response to successful stitching and fusion to one or more maps known to the server, updated position and orientation information relative to the server's maps may be sent back to the clients. Clients may update their local map data with the received server location data. Clients may receive additional map data from the server, which can be used for extending their maps. Clients may send queries to the server for 3D maps, and the queries may include metadata.
Abstract:
A computer-implemented method, apparatus, computer readable medium and mobile device for initializing a 3-Dimensional (3D) map may include obtaining, from a camera, a single image of an urban outdoor scene and estimating an initial pose of the camera. An untextured model of a geographic region may be obtained. Line features from the single image may be extracted and the orientation may be determined with respect to the untextured model and using the extracted line features, the orientation of the camera in 3 Degrees of Freedom (3DOF). In response to determining the orientation of the camera, a translation in 3DOF with respect to the untextured model may be determined using the extracted line features. The 3D map may be initialized based on the determined orientation and translation.
Abstract:
Techniques are presented for monocular visual simultaneous localization and mapping (SLAM) based on detecting a translational motion in the movement of the camera using at least one motion sensor, while the camera is performing panoramic SLAM, and initializing a three dimensional map for tracking of finite features. Motion sensors may include one or more sensors, including inertial (gyroscope, accelerometer), magnetic (compass), vision (camera) or any other sensors built into mobile devices.
Abstract:
A mobile device uses vision and orientation sensor data jointly for six degree of freedom localization, e.g., in wide-area environments. An image or video stream is captured while receiving geographic orientation data and may be used to generate a panoramic cylindrical map of an environment. A bin of model features stored in a database is accessed based on the geographic orientation data. The model features are from a pre-generated reconstruction of the environment produced from extracted features from a plurality of images of the environment. The reconstruction is registered to a global orientation and the model features are stored in bins based on similar geographic orientations. Features from the panoramic cylindrical map are matched to model features in the bin to produce a set of corresponding features, which are used to determine a position and an orientation of the camera.