Abstract:
A system and process for determining the similarity in the shape of objects is presented that generates a novel shape representation called a directional histogram model. This shape representative captures the shape variations of an object with viewing direction, using thickness histograms. The resulting directional histogram model is substantially invariant to scaling and translation. A matrix descriptor can also be derived by applying the spherical harmonic transform to the directional histogram model. The resulting matrix descriptor is substantially invariant to not only scaling and translation, but rotation as well. The matrix descriptor is also robust with respect to local modification or noise, and able to readily distinguish objects with different global shapes. The typical applications of the directional histogram model and matrix descriptor include recognizing 3D solid shapes, measuring the similarity between different objects and shape similarity based object retrieval.
Abstract:
The illustrated and described embodiments describe techniques for capturing data that describes 3-dimensional (3-D) aspects of a face, transforming facial motion from one individual to another in a realistic manner, and modeling skin reflectance.
Abstract:
A system and process for generating High Dynamic Range (HDR) video is presented which involves first capturing a video image sequence while varying the exposure so as to alternate between frames having a shorter and longer exposure. The exposure for each frame is set prior to it being captured as a function of the pixel brightness distribution in preceding frames. Next, for each frame of the video, the corresponding pixels between the frame under consideration and both preceding and subsequent frames are identified. For each corresponding pixel set, at least one pixel is identified as representing a trustworthy pixel. The pixel color information associated with the trustworthy pixels is then employed to compute a radiance value for each pixel set to form a radiance map. A tone mapping procedure can then be performed to convert the radiance map into an 8-bit representation of the HDR frame.
Abstract:
A system and method for deghosting mosaics provides a novel multiperspective plane sweep approach for generating an image mosaic from a sequence of still images, video images, scanned photographic images, computer generated images, etc. This multiperspective plane sweep approach uses virtual camera positions to compute depth maps for columns of overlapping pixels in adjacent images. Object distortions and ghosting caused by image parallax when generating the image mosaics are then minimized by blending pixel colors, or grey values, for each computed depth to create a common composite area for each of the overlapping images. Further, the multiperspective plane sweep approach described herein is both computationally efficient, and applicable to both the case of limited overlap between the images used for creating the image mosaics, and to the case of extensive or increased image overlap.
Abstract:
Methods and systems for animating facial features and transforming facial expressions are described. In one embodiment, a code book contains data that defines a set of facial expressions of a first person. A training set of facial expressions from a second person and corresponding expressions from the code book are used to derive a transformation function that is then applied to all of the expressions of the code book. In this manner, expressions from the first person can be realistically transformed into expressions of a second person and vice versa. Particularly advantageous aspects of the described embodiments provide a single common generic face model that is used as the basis for a fitting operation for many different faces. Use of the single common generic face model and certain user-defined constraints provide a mechanism by which correspondences between the different faces can be established. These correspondences provide a basis for facial animation operations, among which are included expression transformation.
Abstract:
A method and a system for self-calibrating a wide field-of-view camera (such as a catadioptric camera) using a sequence of omni-directional images of a scene obtained from the camera. The present invention uses the consistency of pairwise features tracked across at least a portion of the image collection and uses these tracked features to determine unknown calibration parameters based on the characteristics of catadioptric imaging. More specifically, the self-calibration method of the present invention generates a sequence of omni-directional images representing a scene and tracks features across the image sequence. An objective function is defined in terms of the tracked features and an error metric (an image-based error metric in a preferred embodiment). The catadioptric imaging characteristics are defined by calibration parameters, and determination of optimal calibration parameters is accomplished by minimizing the objective function using an optimizing technique. Moreover, the present invention also includes a technique for reformulating a projection equation such that the projection equation is equivalent to that of a rectilinear perspective camera. This technique allows analyses (such as structure from motion) to be applied (subsequent to calibration of the catadioptric camera) in the same direct manner as for rectilinear image sequences.
Abstract:
A 3-D effect is added to a single image by adding depth to the single image. Depth can be added to the single image by selecting an arbitrary region or a number of pixels. A user interface simultaneously displays the single image and novel views of the single original image taken from virtual camera positions rotated relative to the original field of view. Depths given to the original image allow pixels to be reprojected onto the novel views to allow the user to observe the depth changes as they are being added. Functions are provided to edit gaps or voids generated in the process of adding depth to the single image. The gaps occur because of depth discontinuities between regions to which depth has been added and the voids are due to the uncovering of previously occluded surfaces in the original image.
Abstract:
In an coder for producing a bitstream representative of a sequence of video images, a previous image is registered with a current image using spline-based registration to produce estimated motion vectors. The estimated motion vectors are used to match blocks of the previous image and the current image to produce translation vectors. The translation vectors compensate for motion while encoding the sequence as a bitstream.
Abstract:
A computerized method and related computer system synthesize video from a plurality of sources of image data. The sources include a variety of image data types such a collection of image stills, a sequence of video frames, and 3-D models of objects. Each source provides image data associated with an object. One source provides image data associated with a first object, and a second source provides image data associated with a second object. The image data of the first and second objects are combined to generate composite images of the first and second objects. From the composite images, an output image of the first and second objects as viewed from an arbitrary viewpoint is generated. Gaps of pixels with unspecified pixel values may appear in the output image. Accordingly, a pixel value for each of these “missing pixels” is obtained by using an epipolar search process to determine which one of the sources of image data should provide the pixel value for that missing pixel.
Abstract:
In a computerized method, the three-dimensional structure of an object is recovered from a closed-loop sequence of two-dimensional images taken by a camera undergoing some arbitrary motion. In one type of motion, the camera is held fixed, while the object completes a full 360.degree. rotation about an arbitrary axis. Alternatively, the camera can make a complete rotation about the object. In the sequence of images, feature tracking points are selected using pair-wise image registration. Ellipses are fitted to the feature tracking points to estimate the tilt of the axis of rotation. A set of variables are set to fixed values while minimizing an image-based objective function to extract a set of first structure and motion parameters. Then the set of variables freed while minimizing of the objective function continues to extract a second set of structure and motion parameters that are substantially the same as the first set of structure and motion parameters.