摘要:
A process for compressing and decompressing non-keyframes in sequential sets of contemporaneous video frames making up multiple video streams where the video frames in a set depict substantially the same scene from different viewpoints. Each set of contemporaneous video frames has a plurality frames designated as keyframes with the remaining being non-keyframes. In one embodiment, the non-keyframes are compressed using a multi-directional spatial prediction technique. In another embodiment, the non-keyframes of each set of contemporaneous video frames are compressed using a combined chaining and spatial prediction compression technique. The spatial prediction compression technique employed can be a single direction technique where just one reference frame, and so one chain, is used to predict each non-keyframe, or it can be a multi-directional technique where two or more reference frames, and so chains, are used to predict each non-keyframe.
摘要:
A system and process for generating a two-layer, 3D representation of a digital or digitized image from the image and a pixel disparity map of the image is presented. The two layer representation includes a main layer having pixels exhibiting background colors and background disparities associated with correspondingly located pixels of depth discontinuity areas in the image, as well as pixels exhibiting colors and disparities associated with correspondingly located pixels of the image not found in these depth discontinuity areas. The other layer is a boundary layer made up of pixels exhibiting foreground colors, foreground disparities and alpha values associated with the correspondingly located pixels of the depth discontinuity areas. The depth discontinuity areas correspond to prescribed sized areas surrounding depth discontinuities found in the image using a disparity map thereof.
摘要:
Multi-pass image resampling technique embodiments are presented that employ a series of one-dimensional filtering, resampling, and shearing stages to achieve good efficiency while maintaining high visual fidelity. In one embodiment, high-quality (multi-tap) image filtering is used inside each one-dimensional resampling stage. Because each stage only uses one-dimensional filtering, the overall computation efficiency is very good and amenable to graphics processing unit (GPU) implementation using pixel shaders. This embodiment also upsamples the image before shearing steps in a direction orthogonal to the shearing to prevent aliasing, and then downsamples the image to its final size with high-quality low-pass filtering. This ensures that none of the stages causes excessive blurring or aliasing.
摘要:
A system and process for compressing and decompressing multiple video streams depicting substantially the same dynamic scene from different viewpoints that from a grid of viewpoints. Each frame in each contemporaneous set of video frames of the multiple streams is represented by at least a two layers—a main layer and a boundary layer. Compression of the main layers involves first designating one or more of these layers in each set of contemporaneous frames as keyframes. For each set of contemporaneous frames in time sequence order, the main layer of each keyframe is compressed using an inter-frame compression technique. In addition, the main layer of each non-keyframe within the frame set under consideration is compressed using a spatial prediction compression technique. Finally, the boundary layers of each frame in the current frame set are each compressed using an intra-frame compression technique. Decompression is generally the reverse of the compression process.
摘要:
Multi-pass image resampling technique embodiments are presented that employ a series of one-dimensional filtering, resampling, and shearing stages to achieve good efficiency while maintaining high visual fidelity. In one embodiment, high-quality (multi-tap) image filtering is used inside each one-dimensional resampling stage. Because each stage only uses one-dimensional filtering, the overall computation efficiency is very good and amenable to graphics processing unit (GPU) implementation using pixel shaders. This embodiment also upsamples the image before shearing steps in a direction orthogonal to the shearing to prevent aliasing, and then downsamples the image to its final size with high-quality low-pass filtering. This ensures that none of the stages causes excessive blurring or aliasing.
摘要:
A system and process for generating High Dynamic Range (HDR) video is presented which involves first capturing a video image sequence while varying the exposure so as to alternate between frames having a shorter and longer exposure. The exposure for each frame is set prior to it being captured as a function of the pixel brightness distribution in preceding frames. Next, for each frame of the video, the corresponding pixels between the frame under consideration and both preceding and subsequent frames are identified. For each corresponding pixel set, at least one pixel is identified as representing a trustworthy pixel. The pixel color information associated with the trustworthy pixels is then employed to compute a radiance value for each pixel set to form a radiance map. A tone mapping procedure can then be performed to convert the radiance map into an 8-bit representation of the HDR frame.
摘要:
Methods and systems for generating free viewpoint video using an active infrared (IR) stereo module are provided. The method includes computing a depth map for a scene using an active IR stereo module. The depth map may be computed by projecting an IR dot pattern onto the scene, capturing stereo images from each of two or more synchronized IR cameras, detecting dots within the stereo images, computing feature descriptors corresponding to the dots in the stereo images, computing a disparity map between the stereo images, and generating the depth map using the disparity map. The method also includes generating a point cloud for the scene using the depth map, generating a mesh of the point cloud, and generating a projective texture map for the scene from the mesh of the point cloud. The method further includes generating the video for the scene using the projective texture map.
摘要:
An interest point detection technique is presented. More particularly, for each of possibly multiple image pyramid resolutions, a cornerness image is generated. One or more potential interest point locations are identified in the cornerness image. This involves finding locations associated with a pixel that exhibits a higher corner strength value than pixels in a prescribed-sized surrounding pixel neighborhood. The potential interest point locations are then clustered to identify groups that likely derive from a same 2D structure. Potential interest point locations in one or more of the identified groups are respectively combined to produce a single location that represents the combined group. The representative location of each group having one is then designated as an interest point. An optional location refinement can also be implemented.
摘要:
Cloud based FVV streaming technique embodiments presented herein generally employ a cloud based FVV pipeline to create, render and transmit FVV frames depicting a captured scene as would be viewed from a current synthetic viewpoint selected by an end user and received from a client computing device. The FVV frames use a similar level of bandwidth as a conventional streaming movie would consume. To change viewpoints, a new viewpoint is sent from the client to the cloud, and a new streaming movie is initiated from the new viewpoint. Frames associated with that viewpoint are created, rendered and transmitted to the client until a new viewpoint request is received.
摘要:
Free viewpoint video of a scene is generated and presented to a user. An arrangement of sensors generates streams of sensor data each of which represents the scene from a different geometric perspective. The sensor data streams are calibrated. A scene proxy is generated from the calibrated sensor data streams. The scene proxy geometrically describes the scene as a function of time and includes one or more types of geometric proxy data which is matched to a first set of current pipeline conditions in order to maximize the photo-realism of the free viewpoint video resulting from the scene proxy at each point in time. A current synthetic viewpoint of the scene is generated from the scene proxy. This viewpoint generation maximizes the photo-realism of the current synthetic viewpoint based upon a second set of current pipeline conditions. The current synthetic viewpoint is displayed.