摘要:
The present invention is embodied in systems and methods for determining structure and motion of a three-dimensional (3D) object using two-dimensional (2D) images of the object obtained from multiple sets of views with different projection models, such as from a full perspective view and a weak perspective views. A novel fundamental matrix is derived that embodies the epipolar geometry between a full perspective view and a weak perspective view. The systems and methods of the present invention preferably uses the derived fundamental matrix together with the 2D image information of the full and weak perspective views to digitally reconstruct the 3D object and produce results with multi-resolution processing techniques. These techniques include recovering and refining motion parameters and recovering and refining structure parameters of the fundamental matrix. The results can include, for example, 3D positions of points, camera position between different views, texture maps, and the like.
摘要:
A method and apparatus for recovering a three-dimensional (3D) scene from two-dimensional (2D) images. A sequence of images is divided into a number of smaller segments and a 3D reconstruction is performed on each segment individually. All the reconstructed segments are then combined together through an efficient bundle adjustment to complete the 3D reconstruction. Segmenting may be achieved by dividing the segments based on the number of feature points that are in each frame. The number of frames per segment is reduced by creating virtual key frames. The virtual key frames encode the 3D structure for each segment, but are only a small subset of the original frames in the segment. A final bundle adjustment is performed on the virtual key frames, rather than all of the original frames. Thus, the final bundle adjustment is two orders of magnitude faster than a conventional bundle adjustment.
摘要:
The present invention is embodied in systems and methods for determining structure and motion of a three-dimensional (3D) object using two-dimensional (2D) images of the object obtained from multiple sets of views with different projection models, such as from a full perspective view and a weak perspective views. A novel fundamental matrix is derived that embodies the epipolar geometry between a full perspective view and a weak perspective view. The systems and methods of the present invention preferably uses the derived fundamental matrix together with the 2D image information of the full and weak perspective views to digitally reconstruct the 3D object and produce results with multi-resolution processing techniques. These techniques include recovering and refining motion parameters and recovering and refining structure parameters of the fundamental matrix. The results can include, for example, 3D positions of points, camera position between different views, texture maps, and the like.
摘要:
The subject disclosure is directed towards a technology by which dynamic hand gestures are recognized by processing depth data, including in real-time. In an offline stage, a classifier is trained from feature values extracted from frames of depth data that are associated with intended hand gestures. In an online stage, a feature extractor extracts feature values from sensed depth data that corresponds to an unknown hand gesture. These feature values are input to the classifier as a feature vector to receive a recognition result of the unknown hand gesture. The technology may be used in real time, and may be robust to variations in lighting, hand orientation, and the user's gesturing speed and style.
摘要:
Multi-modal, multi-lingual devices can be employed to consolidate numerous items including, but not limited to, keys, remote controls, image capture devices, audio recorders, cellular telephone functionalities, location/direction detectors, health monitors, calendars, gaming devices, smart home inputs, pens, optical pointing devices or the like. For example, a corner of a cellular telephone can be used as an electronic pen. Moreover, the device can be used to snap multiple pictures stitching them together to create a panoramic image. A device can automate ignition of an automobile, initiate appliances, etc. based upon relative distance. The device can provide for near to eye capabilities for enhanced image viewing. Multiple cameras/sensors can be provided on a single device to provide for stereoscopic capabilities. The device can also provide assistance to blind, privacy, etc. by consolidating services.
摘要:
A system facilitates managing one or more devices utilized for communicating data within a telepresence session. A telepresence session can be initiated within a communication framework that includes a first user and one or more second users. In response to determining a temporary absence of the first user from the telepresence session, a recordation of the telepresence session is initialized to enable a playback of a portion or a summary of the telepresence session that the first user has missed.
摘要:
A person is provided with the ability to auditorily determine the spatial geometry of his current physical environment. A spatial map of the current physical environment of the person is generated. The spatial map is then used to generate a spatialized audio representation of the environment. The spatialized audio representation is then output to a stereo listening device which is being worn by the person.
摘要:
A spatial element is added to communications, including over telephone conference calls heard through headphones or a stereo speaker setup. Functions are created to modify signals from different callers to create the illusion that the callers are speaking from different parts of the room.
摘要:
Multiple images including a face presented by a user are accessed. One or more determinations are made based on the multiple images, such as a determination of whether the face included in the multiple images is a 3-dimensional structure or a flat surface and/or a determination of whether motion is present in one or more face components (e.g., eyes or mouth). If it is determined that the face included in the multiple images is a 3-dimensional structure or that that motion is present in the one or more face components, then an indication is provided that the user can be authenticated. However, if it is determined that the face included in the multiple images is a flat surface or that motion is not present in the one or more face components, then an indication is provided that the user cannot be authenticated.
摘要:
Reaction information of participants to an interaction may be sensed and analyzed to determine one or more reactions or dispositions of the participants. Feedback may be provided based on the determined reactions. The participants may be given an opportunity to opt in to having their reaction information collected, and may be provided complete control over how their reaction information is shared or used.