摘要:
Automated layer extraction from 2D images making up a 3D scene, and automated image pixel assignment to layers, to provide for scene modeling, is disclosed. In one embodiment, a computer-implemented method determines a number of planes, or layers, and assigns pixels to the planes. The method can determine the number of planes by first determining the high-entropy pixels of the images, and then determining a 1-plane through a predetermined n-plane estimation, such as via a robust estimation, and a most likely x-plane estimation, where x is between 1 and n, such as via a Bayesian approach. Furthermore, the method can assign pixels via an iterative EM approach based on classifying criteria.
摘要:
A system and process for computing motion or depth estimates from multiple images. In general terms this is accomplished by associating a depth or motion map with each input image (or some subset of the images equal or greater than two), rather that computing a single map for all the images. In addition, consistency between the estimates associated with different images is ensured. More particularly, this involves minimizing a three-part cost function, which consists of an intensity (or color) compatibility constraint, a motion/depth compatibility constraint, and a flow smoothness constraint. In addition, a visibility term is added to the intensity (or color) compatibility and motion/depth compatibility constraints to prevent the matching of pixels into areas that are occluded. In operation, the cost function is computed in two phases. During an initializing phase, the motion or depth for each image being examined are estimated independently. Since there are not yet any estimates for other frames to employ in the calculation, the motion or depth compatibility term is ignored. In addition, no visibilities are computed and it is assumed all pixels are visible. Once an initial set of motion estimates have been computed, the visibilities are computed and the motion or depth estimates recalculated using the visibility terms and the motion or depth compatibility constraint. The foregoing process can then be repeated several times using the revised estimates from the previous iteration as the initializing estimates for the new iteration, to obtain better estimates of motion/depth values and visibility.
摘要:
The described implementations relate to deblurring images. One system includes an imaging device configured to capture an image, a linear motion detector and a rotational motion detector. This system also includes a controller configured to receive a signal from the imaging device relating to capture of the image and to responsively cause the linear motion detector and the rotational motion detector to detect motion-related information. Finally, this particular system includes a motion calculator configured to recover camera motion associated with the image based upon the detected motion-related information and to infer imaging device motion induced blur of the image and an image deblurring component configured to reduce imaging device induced blur from the image utilizing the inferred camera motion induced blur.
摘要:
A local bi-gram model object recognition system and method for constructing a local bi-gram model and using the model to recognize objects in a query image. In a learning phase, the local bi-gram model is constructed that represents objects found in a set of training images. The local bi-gram model is a local spatial model that only models the relationship of neighboring features without any knowledge of their global context. Object recognition is performed by finding a set of matching primitives in the query image. A tree structure of matching primitives is generated and a search is performed to find a tree structure of matching primitives that obeys the local bi-gram model. The local bi-gram model can be found using unsupervised learning. The system and method also can be used to recognize objects unsupervised that are undergoing non-rigid transformations for both object instance recognition and category recognition.
摘要:
Techniques and systems are disclosed for navigating human scale image data using aligned perspective images. A consecutive sequence of digital images is stacked together by aligning consecutive images laterally with an image offset between edges of consecutive images corresponding to a distance between respective view windows of the consecutive images. A view window of an image in the sequence is rendered, where the view window of the image corresponds to a desired location. Offset portions of the view window of a desired number of images in the sequence are rendered, for example, alongside the full view of the image at the desired location.
摘要:
A method, system and media for generating and querying spatial multimedia indices are provided. A multimedia corpus representing varying view points and distributed across a large network, such as the Internet, is crawled to extract properties from the multimedia. The extracted properties and relationships among multimedia are stored and indexed in clusters associated with a space-scale hierarchy. Accordingly, a spatial multimedia service may utilize the space-scale hierarchy to update the spatial multimedia indices and to respond to user queries.
摘要:
A system and process for generating a new video sequence from frames taken from an input video clip. Generally, this involves computing a similarity value between each of the frames of the input video clip and each of the other frames. For each frame, the similarity values associated therewith are analyzed to identify potentially acceptable transitions between it and the remaining frames. A transition is considered acceptable if it would appear smooth to a person viewing a video containing the frames, or at least if the transition is one of the best available. A new video sequence is then synthesized using the identified transitions to specify an order in which the frames associated with these transitions are to be played. Finally, the new video sequence is rendered by playing the frames of the input video clip in the order specified in the synthesizing procedure. This rendering procedure can include a smoothing action in which those transitions that were deemed acceptable, but would not appear smooth to a viewer, are smoothed to lessen the discontinuity. This general process can be used to generate continuous video sequences or fixed-length, loopable sequences. In addition, the process can be extended to process areas of independent motion in the input video clip separately and then recombine them during the rendering procedure, separate video texture elements from their backgrounds so that they can be used as video sprites.
摘要:
A system and method for inverse texture mapping in which given a 3D model and several images from different viewpoints, a texture map is extracted for each planar surface in the 3D model. The system and method employs a unique weighted pyramid feathering scheme for blending multiple images to form the texture map, even where the images are taken from different viewpoints, at different scales, and with different exposures. This scheme also makes it possible to blend images with cut-out regions which may be present due to occlusions or moving objects. It further advantageously employs weight maps to improve the quality of the blended image.
摘要:
An interactive system and process for constructing a model of a 3D scene from a panoramic view of the scene. In the constructed model, the 3D scene is represented by sets of connected planes. The modeling begins by providing the user with a display of an image of the panoramic view. The user is then required to specify information concerning certain geometric features of the scene. A computer program recovers a camera orientation matrix of the panoramic view based on the features specified by the user. Plane normals and line directions for planes in the 3D scene are estimated using this matrix as well as the user-specified information. A camera translation is also recovered, as are plane distances and vertex point locations for planes in the 3D scene, using the user-supplied information, camera orientation matrix, and the estimated plane normals and line directions. The model of the 3D scene is then constructed based on the plane normal and plane distance, and/or the vertex point locations, of each plane in the set. Preferably, the plane distances and vertex point locations, and optionally the camera translation, are recovered by creating a system of equations based on the geometric constraints of the 3D scene. The constraint equation are characterized as hard is they include a user-designated parameter, otherwise they are considered soft constraints. The systems of equations is solved in a manner which gives priority to hard constraint equations. A decomposing process can also be employed prior to solving the systems of equation to ensure their solvability.
摘要:
The invention is embodied in a method for reconstructing 3-dimensional geometry by computing 3-dimensional points on an object or a scene including many objects visible in images taken from different views of the object or scene. The method includes identifying at least one set of initial pixels visible in both the views lying on a generally planar surface on the object, computing from the set of initial pixels an estimated homography between the views, defining at least an additional pixel on the one surface in one of the images and computing from the estimated homography a corresponding additional pixel in the other view, computing an optimal homography and an epipole from the initial and additional pixels (including at least some points not on the planar surface), and computing from the homography and the epipole 3-dimensional locations of points on the object by triangulation between the views of corresponding ones of the pixels. Each of the initial pixels in one of the views corresponds to one of the initial pixels in the other of the views and both correspond to a point on the object.