摘要:
Embodiments of the invention disclose a system and a method for determining a disparity search range for a current stereo image of a scene based on a set of stereo images of the scene, comprising steps of: selecting a subset of stereo images from the set of stereo images, the subset includes the current stereo image and at least one neighboring stereo image, wherein the neighboring stereo image is temporally-neighboring to the current stereo image; determining a disparity histogram for each stereo image in the subset of stereo images to form a set of disparity histograms; determining a weighted disparity histogram as a weighted sum of the disparity histograms in the set of disparity histograms; and determining the disparity search range from the weighted disparity histogram.
摘要:
A method randomly accesses multiview videos. Multiview videos are acquired of a scene with corresponding cameras arranged at poses, such that there is view overlap between any pair of cameras. V-frames are generated from the multiview videos. The V-frames are encoded using only spatial prediction. Then, the V-frames are inserted periodically in an encoded bit stream to provide random temporal access to the multiview videos. Additional view dependency information enables the decoding of a reduced number of frames prior to accessing randomly a target frame for a specified view and time, and decoding the target frame.
摘要:
A method reconstructs a depth image encoded as a base layer bitstream, and a set of enhancement layer bitstreams. The base layer bitstream is decoded to produce pixels of a reconstructed base layer image corresponding to the depth image. Each enhancement layer bitstream is decoded in a low to high order to produces a reconstructed residual image. During the decoding of the enhancement layer bitstream, a context model is maintained using an edge map, and each enhancement layer bitstream is entropy decoded using the context model to determine a significance value corresponding to pixels of the reconstructed residual image and a sign bit for each significant pixel, and a pixel value of the reconstructed residual image is reconstructed according to the significance value, sign bit and an uncertainty interval. Then, the reconstructed residual images are added to the reconstructed base layer image to produce the reconstructed depth image.
摘要:
A method processes a multiview videos of a scene, in which each video is acquired by a corresponding camera arranged at a particular pose, and in which a view of each camera overlaps with the view of at least one other camera. Side information for synthesizing a particular view of the multiview video is obtained in either an encoder or decoder. A synthesized multiview video is synthesized from the multiview videos and the side information. A reference picture list is maintained for each current frame of each of the multiview videos, the reference picture indexes temporal reference pictures and spatial reference pictures of the acquired multiview videos and the synthesized reference pictures of the synthesized multiview video. Each current frame of the multiview videos is predicted according to reference pictures indexed by the associated reference picture list with a skip mode and a direct mode, whereby the side information is inferred from the synthesized reference picture.
摘要:
A method filters pixels in a sequence of images. Each image in the sequence is partitioned into blocks of pixels, and the images are processed sequentially. The energy is determined for each block of pixels in each image. The energy of each block is based on variances of intensities of the pixels in the sequence of images. A 3D fuzzy filter is applied to each current pixel in each current block during the sequential processing. The 3D fuzzy filter considers the energy of the block, and the intensities of pixels spatially adjacent and temporally adjacent to the current pixel to remove blocking and ringing artifacts.
摘要:
An image for a virtual view of a scene is generated based on a set of texture images and a corresponding set of depth images acquired of the scene. A set of candidate depths associated with each pixel of a selected image is determined. For each candidate depth, a cost that estimates a synthesis quality of the virtual image is determined. The candidate depth with a least cost is selected to produce an optimal depth for the pixel. Then, the virtual image is synthesized based on the optimal depth of each pixel and the texture images. The method also applies first and second depth enhancement before, and during view synthesis to correct errors or suppress noise due to the estimation or acquisition of the dense depth images and sparse depth features.
摘要:
A method codes pictures in a bitstream, wherein the bitstream includes coded pictures to obtain data for associated TUs and data for generating a transform tree, and a partitioning of coding units (CUs) into Prediction Units (PUs), and data for obtaining prediction modes or directions associated with each PU. One or more mapping tables are defined, wherein each row of each table has an associated index and a first set of transform types to be used for applying an inverse transformation to the data in TU. The first set of transform types is selected according to an index, and then a second set of transform types is applied as the inverse transformation to the data, wherein the second set of transform types is determined according to the first set of transform types and a transform-toggle flag (ttf) to obtain a reconstructed prediction residual.
摘要:
A video encoded as a bit stream is decoded by maintaining a set of dictionaries generated from decoded prediction residual signals, wherein elements of the set of dictionaries have associated indices. A current macroblock is entropy decoded and inverse quantized to produce decoded coefficients. For the current macroblock, a particular dictionary of the set of dictionaries is selected according to a prediction mode signaled in the bit stream, and particular elements of the particular dictionary are selected according to a copy mode signal in the bit stream and the associated index. The particular elements is scaled and combined, using the decoded coefficients, to reconstruct a current decoded macroblock prediction residual signal. Then, the current decoded macroblock prediction residual signal is combined with previously decoded macroblocks to generate an output macroblock of a reconstructed video, wherein the steps are performed in a decoder.
摘要:
A bitstream includes coded pictures, and split-flags for generating a transform tree. The bit stream is a partitioning of coding units (CUs) into Prediction Units (PUs). The transform tree is generated according to the split-flags. Nodes in the transform tree represent transform units (TU) associated with the CUs. The generation splits each TU only if the corresponding split-flag is set. For each PU that includes multiple TUs, the multiple TUs are merged into a larger TU, and the transform tree is modified according to the splitting and merging. Then, data contained in each PU can be decoded using the TUs associated with the PU according to the transform tree.
摘要:
A method for encoding a source image, wherein the source image includes a set of bitplanes of pixels, is disclosed. For each bitplane in a most to least significant order, the method include obtaining a list of significant pixels (LSP), a list of insignificant pixels (LIP), and a list of insignificant sets (LIS) according to a hierarchical ordering of the source image pixels; synchronizing the LSP, LIP and LIS of the source image with the LSP, LIP and LIS of a key image; constructing a temporary list of insignificant sets (TLIS) for the source image; and applying syndrome encoding to the LSP, LIP, and TLIS of the source image to obtain syndromes corresponding to magnitudes and signs of pixels in the source image, wherein the steps are performed in a processor.