摘要:
A method and computer input device are provided for controlling a displayed object. Using the method and computer input device, an indication of the amount of rotation and translation of the computer input device is received. A decision is then made as to whether to use the amount of rotation of the computer input device to control a displayed object based on the amount of translation of the computer input device.
摘要:
A computer input device and computer system are provided that determine if the input device is at an edge of a pattern on a working surface based on an image of the working surface captured by the input device. An audio control message is generated based on the input device being positioned on the edge of the pattern and the audio control message is used to cause a speaker to generate an audio signal.
摘要:
A local bi-gram model object recognition system and method for constructing a local bi-gram model and using the model to recognize objects in a query image. In a learning phase, the local bi-gram model is constructed that represents objects found in a set of training images. The local bi-gram model is a local spatial model that only models the relationship of neighboring features without any knowledge of their global context. Object recognition is performed by finding a set of matching primitives in the query image. A tree structure of matching primitives is generated and a search is performed to find a tree structure of matching primitives that obeys the local bi-gram model. The local bi-gram model can be found using unsupervised learning. The system and method also can be used to recognize objects unsupervised that are undergoing non-rigid transformations for both object instance recognition and category recognition.
摘要:
A mouse is provided that uses a camera as its input sensor. A real-time vision algorithm determines the six degree-of-freedom mouse posture, consisting of 2D motion, tilt in the forward/back and left/right axes, rotation of the mouse about its vertical axis, and some limited height sensing. Thus, a familiar 2D device can be extended for three-dimensional manipulation, while remaining suitable for standard 2D Graphical User Interface tasks. The invention includes techniques for mouse functionality, 3D manipulation, navigating large 2D spaces, and using the camera for lightweight scanning tasks.
摘要:
A system and process for computing a 3D reconstruction of a scene using multiperspective panoramas. The reconstruction can be generated using a cylindrical sweeping approach, or under some conditions, traditional stereo matching algorithms. The cylindrical sweeping process involves projecting each pixel of the multiperspective panoramas onto each of a series of cylindrical surfaces of progressively increasing radii. For each pixel location on each cylindrical surface, a fitness metric is computed for all the pixels projected thereon to provide an indication of how closely a prescribed characteristic of the projected pixels matches. Then, for each respective group of corresponding pixel locations of the cylindrical surfaces, it is determined which location has a fitness metric that indicates the prescribed characteristic of the projected pixels matches more closely than the rest. For each of these winning pixel locations, its panoramic coordinates are designated as the position of the portion of the scene depicted by the pixels projected to that location. Additionally, in some cases a sufficiently horizontal epipolar geometry exists between multiperspective panoramas such that traditional stereo matching algorithms can be employed for the reconstruction. A symmetric pair of multiperspectives panoramas produces the horizontal epipolar geometry. In addition, this geometry is obtained if the distance from the center of rotation to the viewpoints used to capture the images employed to construct the panorama is small in comparison to the distance from the center of rotation to the nearest scene point depicted in the images, or if an off-axis angle is kept small.
摘要:
A system and process for generating a 3D video animation of an object referred to as a 3D Video Texture is presented. The 3D Video Texture is constructed by first simultaneously videotaping an object from two or more different cameras positioned at different locations. Video from, one of the cameras is used to extract, analyze and synthesize a video sprite of the object of interest. In addition, the first, contemporaneous, frames captured by at least two of the cameras are used to estimate a 3D depth map of the scene. The background of the scene contained within the depth map is then masked out, and a clear shot of the scene background taken before filming of the object began, leaving just the object. To generate each new frame in the 3D video animation, the extracted region making up a “frame” of the video sprite is mapped onto the previously generated 3D surface. The-resulting image is rendered from a novel viewpoint, and then combined with a flat image of the background which has been warped to the correct location. In cases where it is anticipated that the subject could move frequently, the foregoing part of the procedure associated with estimating a 3D depth map of the scene and extracting the 3D surface representation of the object is performed for each subsequent set of contemporaneous frames captured by at least two of the cameras.
摘要:
The primary components of the panoramic video viewer include a decoder module. The purpose of the decoder module is to input incoming encoded panoramic video data and to output a decoded version thereof. The incoming data may be provided over a network and originate from a server, or it may simply be read from a storage media, such as a hard drive, CD or DVD. Once decoded, the data associated with each video frame is preferably stored in a storage module and made available to a 3D rendering module. The 3D rendering module is essentially a texture mapper that takes the frame data and maps the desired views onto a prescribed environment model. The output of the 3D rendering module is provided to a display module where the panoramic video is viewed by a user of the system. Typically, the user will be viewing just a portion of the scene depicted in the panoramic video at any one time, and will be able to control what portion is viewed. Preferably, the panoramic video viewer will allow the user to pan through the scene to the left, right, up or down. In addition, the user would preferably be able to zoom in or out within the portion of the scene being viewed. The user could also be allowed to select what video should be played, choose when to play or pause the video, and to specify what temporal part of the video should be played.
摘要:
A system and method for extracting structure from stereo that represents the scene as a collection of planar layers. Each layer optimally has an explicit 3D plane equation, a colored image with per-pixel opacity, and a per-pixel depth value relative to the plane. Initial estimates of the layers are recovered using techniques from parametric motion estimation. The combination of a global model (the plane) with a local correction to it (the per-pixel relative depth value) imposes enough local consistency to allow the recovery of shape in both textured and untextured regions.
摘要:
A system and method for extracting structure from stereo that represents the scene as a collection of planar layers. Each layer optimally has an explicit 3D plane equation, a colored image with per-pixel opacity, and a per-pixel depth value relative to the plane. Initial estimates of the layers are made and then refined using a re-synthesis step which takes into account both occlusions and mixed pixels. Reasoning about these effects allows the recovery of depth and color information with high accuracy, even in partially occluded regions. Moreover, the combination of a global model (the plane) with a local correction to it (the per-pixel relative depth value) imposes enough local consistency to allow the recovery of shape in both textured and untextured regions.
摘要:
A system and method for creating weight maps capable of indicating how much each pixel in an image should contribute to a blended image. One such map is a view-dependent weight map created by inputting an image that has been characterized as a collection of regions. A 2D perspective transform is computed for each region that is to be part of the weight map. The transforms are used to warp the associated regions to prescribed coordinates to create a warped image. Once the warped image is created, a Jacobian matrix is computed for each pixel. The determinant of each Jacobian matrix is then computed to establish a weight factor for that pixel. The weight map for the inputted image is created using these computed determinants. Another advantageous weight map is a combination weight map. The process for creating type of weight map is identical to the view-dependant map up to the point the warped image has been created. After that, a first weight factor is computed for each pixel of the warped image using a first weight mapping process. At least one additional weight factor is also computed for each pixel using one or more additional weight mapping processes. The weight factors computed for each pixel are then combined to create a combined weight factor and the weight map is formed from these factors. Preferably, one of the weight mapping processes used to create the combination weight map is the aforementioned view-dependent weight mapping process.