Abstract:
A system and process for generating a two-layer, 3D representation of a digital or digitized image from the image and a pixel disparity map of the image is presented. The two layer representation includes a main layer having pixels exhibiting background colors and background disparities associated with correspondingly located pixels of depth discontinuity areas in the image, as well as pixels exhibiting colors and disparities associated with correspondingly located pixels of the image not found in these depth discontinuity areas. The other layer is a boundary layer made up of pixels exhibiting foreground colors, foreground disparities and alpha values associated with the correspondingly located pixels of the depth discontinuity areas. The depth discontinuity areas correspond to prescribed sized areas surrounding depth discontinuities found in the image using a disparity map thereof.
Abstract:
In the described embodiment, methods and systems for processing facial image data for use in animation are described. In one embodiment, a system is provided that illuminates a face with illumination that is sufficient to enable the simultaneous capture of both structure data, e.g. a range or depth map, and reflectance properties, e.g. the diffuse reflectance of a subject's face. This captured information can then be used for various facial animation operations, among which are included expression recognition and expression transformation.
Abstract:
A system and process for reconstructing optimal texture maps from multiple views of a scene is described. In essence, this reconstruction is based on the optimal synthesis of textures from multiple sources. This is generally accomplished using basic image processing theory to derive the correct weights for blending the multiple views. Namely, the steps of reconstructing, warping, prefiltering, and resampling are followed in order to warp reference textures to a desired location, and to compute spatially-variant weights for optimal blending. These weights take into consideration the anisotropy in the texture projection and changes in sampling frequency due to foreshortening. The weights are combined and the computation of the optimal texture is treated as a restoration problem, which involves solving a linear system of equations. This approach can be incorporated in a variety of applications, such as texturing of 3D models, analysis by synthesis methods, super-resolution techniques, and view-dependent texture mapping.
Abstract:
In general, a “Stereoscopic Video Converter” (SVC) provides various techniques for automatically converting arbitrary 2D video sequences into perceptually plausible stereoscopic or “3D” versions while optionally generating dense depth maps for every frame of the video sequence. In particular, the automated 2D-to-3D conversion process first automatically estimates scene depth for each frame of an input video sequence via a label transfer process that matches features extracted from those frames with features from a database of images and videos having known ground truth depths. The estimated depth distributions for all image frames of the input video sequence are then used by the SVC for automatically generating a “right view” of a corresponding stereoscopic image for each frame (assuming that each original input frame represents the “left view” of the stereoscopic image).
Abstract:
Described is a user interface that displays a representation of a stereo scene, and includes interactive mechanisms for changing parameter values that determine the perceived appearance of that scene. The scene is modeled as if viewed from above, including a representation of a viewer's eyes, a representation of a viewing screen, and an indication simulating what each of the viewer eyes perceives on the viewing screen. Variable parameters may include a vergence parameter, a dolly parameter, a field-of-view parameter, an interocular parameter and a proscenium arch parameter.
Abstract:
Methods and systems for generating free viewpoint video using an active infrared (IR) stereo module are provided. The method includes computing a depth map for a scene using an active IR stereo module. The depth map may be computed by projecting an IR dot pattern onto the scene, capturing stereo images from each of two or more synchronized IR cameras, detecting dots within the stereo images, computing feature descriptors corresponding to the dots in the stereo images, computing a disparity map between the stereo images, and generating the depth map using the disparity map. The method also includes generating a point cloud for the scene using the depth map, generating a mesh of the point cloud, and generating a projective texture map for the scene from the mesh of the point cloud. The method further includes generating the video for the scene using the projective texture map.
Abstract:
Game data is rendered in three dimensions in the GPU of a game console. A left camera view and a right camera view are generated from a single camera view. The left and right camera positions are derived as an offset from a default camera. The focal distance of the left and right cameras is infinity. A game developer does not have to encode dual images into a specific hardware format. When a viewer sees the two slightly offset images, the user's brain combines the two offset images into a single 3D image to give the illusion that objects either pop out from or recede into the display screen. In another embodiment, individual, private video is rendered, on a single display screen, for different viewers. Rather than rendering two similar offset images, two completely different images are rendered allowing each player to view only one of the images.
Abstract:
A flash-based strategy is used to separate foreground information from background information within image information. In this strategy, a first image is taken without the use of flash. A second image is taken of the same subject matter with the use of flash. The foreground information in the flash image is illuminated by the flash to a much greater extent than the background information. Based on this property, the strategy applies processing to extract the foreground information from the background information. The strategy supplements the flash information by also taking into consideration motion information and color information.
Abstract:
Two-dimensional (2D) video is converted into multi-view video. The 2D video is segmented to generate a temporally consistent segmented 2D video which is made up of a sequence of segmented frames. The multi-view video is generated by employing user-guided operations to generate depth assignments for the segments associated with user-assigned regions of the segmented frames, where a user-assigned region is formed from a group of contiguous segments selected by the user.
Abstract:
A system and process for compressing and decompressing multiple video streams depicting substantially the same dynamic scene from different viewpoints. Each frame in each contemporaneous set of video frames of the multiple streams is represented by at least a two layers—a main layer and a boundary layer. Compression of the main layers involves first designating one or more of these layers in each set of contemporaneous frames as keyframes. For each set of contemporaneous frames in time sequence order, the main layer of each keyframe is compressed using an inter-frame compression technique. In addition, the main layer of each non-keyframe within the frame set under consideration is compressed using a spatial prediction compression technique. Finally, the boundary layers of each frame in the current frame set are each compressed using an intra-frame compression technique. Decompression is generally the reverse of the compression process.