Abstract:
Systems and methods for video completion by motion field transfer are described. In one aspect, a spatio-temporal target patch of an input video data sequence is filled in or replaced by motion field transfer from a spatio-temporal source patch of the input video data sequence. Color is propagated to corresponding portions of the spatio-temporal target patch by treating the transferred motion information as directed edges These motion field transfer and color propagation operations result in a video completed spatio-temporal target patch. The systems and methods present the video data sequence, which now includes the video completed spatio-temporal target patch, to user for viewing.
Abstract:
A system and process for generating, and then rendering and displaying, an interactive viewpoint video in which a user can watch a dynamic scene while manipulating (freezing, slowing down, or reversing) time and changing the viewpoint at will. In general, the interactive viewpoint video is generated using a small number of cameras to capture multiple video streams. A multi-view 3D reconstruction and matting technique is employed to create a layered representation of the video frames that enables both efficient compression and interactive playback of the captured dynamic scene, while at the same time allowing for real-time rendering.
Abstract:
In the described embodiment, methods and systems for processing facial image data for use in animation are described. In one embodiment, a system is provided that illuminates a face with illumination that is sufficient to enable the simultaneous capture of both structure data, e.g. a range or depth map, and reflectance properties, e.g. the diffuse reflectance of a subject's face. This captured information can then be used for various facial animation operations, among which are included expression recognition and expression transformation.
Abstract:
A system and process for rendering and displaying an interactive viewpoint video is presented in which a user can watch a dynamic scene while manipulating (freezing, slowing down, or reversing) time and changing the viewpoint at will. The ability to interactively control viewpoint while watching a video is an exciting new application for image-based rendering. Because any intermediate view can be synthesized at any time, with the potential for space-time manipulation, this type of video has been dubbed interactive viewpoint video.
Abstract:
A camera-based document scanning system produces electronic versions of documents, based on a plurality of images of discrete portions of the documents. The system compares each pair of consecutive images and derives motion parameters that indicate the relative motion between each pair of consecutive images. The system utilizes the derived motion parameters to align and merge each image with respect to the previous images, thereby building a single, mosaic image of the document. In the illustrative embodiment, the motion parameters are derived by minimizing a sum of squared differences equation on a pixel-by-pixel basis.
Abstract:
A system and method for deghosting mosaics provides a novel multiperspective plane sweep approach for generating an image mosaic from a sequence of still images, video images, scanned photographic images, computer generated images, etc. This multiperspective plane sweep approach uses virtual camera positions to compute depth maps for columns of overlapping pixels in adjacent images. Object distortions and ghosting caused by image parallax when generating the image mosaics are then minimized by blending pixel colors, or grey values, for each computed depth to create a common composite area for each of the overlapping images. Further, the multiperspective plane sweep approach described herein is both computationally efficient, and applicable to both the case of limited overlap between the images used for creating the image mosaics, and to the case of extensive or increased image overlap.
Abstract:
A system and process for generating a two-layer, 3D representation of a digital or digitized image from the image and a pixel disparity map of the image is presented. The two layer representation includes a main layer having pixels exhibiting background colors and background disparities associated with correspondingly located pixels of depth discontinuity areas in the image, as well as pixels exhibiting colors and disparities associated with correspondingly located pixels of the image not found in these depth discontinuity areas. The other layer is a boundary layer made up of pixels exhibiting foreground colors, foreground disparities and alpha values associated with the correspondingly located pixels of the depth discontinuity areas. The depth discontinuity areas correspond to prescribed sized areas surrounding depth discontinuities found in the image using a disparity map thereof.
Abstract:
A system and process for providing an interactive video tour of a tour site to a user is presented. In general, the system and process provides an image-based rendering system that enables users to explore remote real world locations, such as a house or a garden. The present approach is based directly on filming an environment, and then using image-based rendering techniques to replay the tour in an interactive manner. As such, the resulting experience is referred to as Interactive Video Tours. The experience is interactive in that the user can move freely along a path, choose between different directions of motion at branch points in the path, and look around in any direction. The user experience is additionally enhanced with multimedia elements such as overview maps, video textures, and sound.
Abstract:
In the described embodiment, methods and systems for processing facial image data for use in animation are described. In one embodiment, a system is provided that illuminates a face with illumination that is sufficient to enable the simultaneous capture of both structure data, e.g. a range or depth map, and reflectance properties, e.g. the diffuse reflectance of a subject's face. This captured information can then be used for various facial animation operations, among which are included expression recognition and expression transformation.
Abstract:
In a computerized image processing system, a camera acquires a set of images of a scene while rotating the camera about an axis passing through an optical center of the camera. The images of the set overlap each other. An initial estimate of the focal length of the camera is made. The initial focal length can be any reasonable focal length. Using the initial estimate of the focal length, the set of images are composited in a memory to determine an estimated initial composited length. A next best estimate of the focal length is derived from the initial estimated composited length. The set of images are recomposed using the next best focal length estimate. This process is iterated until the absolute difference between the successive estimates of the focal length is less than a predetermined threshold to calibrate the camera. In addition, the process of compositing the set of images can use a weighted interpolation scheme to reduce the blurring effects caused by noise and digitization effects and incorrect focal length.