摘要:
A bitstream includes a sequence of frames. Each frame is partitioned into encoded blocks. For each block, a set of paths is determined at a transform angle determined from a transform index in the bitstream. Transform coefficients are obtained from bitstream. The transform coefficients include one DC coefficient for each path. An inverse transform is applied to the transform coefficients to produce a decoded video.
摘要:
A method acquires a plurality of input videos. The frames of each input video are acquired at a fixed sampling rate. Joint analysis is applied concurrently and in parallel to the input videos to determine a variable and non-uniform temporal sampling rate for each input video so that a combined distortion is minimized and a combined frame rate constraint is satisfied. Each input video is then sampled at the associated variable and non-uniform temporal sampling rate to produce output videos having variable temporal resolutions.
摘要:
A method and system processes a compressed input video. The compressed input video is processed to produce an interlaced picture, and macroblock coding information of the input video. The interlaced picture has a first spatial resolution, and a top-field and a bottom-field. The top-field and the bottom-field of the interlaced picture are filtered adaptively according to the macroblock coding information to produce a progressive picture with a second spatial resolution less than the first spatial resolution.
摘要:
A method classifies pixels in an image by first partitioning the image into blocks. A variance of an intensity is determined for each pixel, and for each block the pixel with the maximum variance is identified. Then, the blocks are classified into classes according to the maximum variance.
摘要:
A method extracts high-level features from a video including a sequence of frames. Low-level features are extracted from each frame of the video. Each frame of the video is labeled according to the extracted low-level features to generate sequences of labels. Each sequence of labels is associated with one of the extracted low-level feature. The sequences of labels are analyzed using learning machine learning techniques to extract high-level features of the video.
摘要:
A method determines distortion in a video by measuring a spatial distortion in coded frames, and by measuring a temporal distortion and spatial distortion in uncoded frames. The spatial distortion of the coded frames is combined with the temporal distortion and the spatial distortion of the uncoded frames to determine a total average distortion in the video.
摘要:
A method estimates rate and distortion characteristics of a video object. First and second object shape features are respectively extracted at a first and second resolution of the video object. First and second rate distortion characteristics of the video object are respectively determined from the extracted first and second object shape features according to first and second modeling parameters. The extracted object shape features can be discrete, such as states of binary shape patterns of the video object, or the object shape features can be continuous such as a set of statistical moments representing a probability density function of the video object.
摘要:
A method determines a surface of an object in a sequence of images. The method begins by estimating a boundary of the object in each image of the sequence using motion information of adjacent images of the sequence. Then, portions of each image of the sequence are ordered to produce an ordered sequence of images. The ordered portions are exterior to the estimated object boundary. Edges in each ordered image are filtered using the motion information, and each ordered image of the sequence is searched to locate the filtered edges to form a new boundary outside the estimated boundary. The filtering and searching are repeated, while projecting the new object boundaries over the sequence of images, until the new object boundaries converges to a surface of the object.
摘要:
A bitstream includes coded pictures, and split-flags for generating a transform tree. The bit stream is a partitioning of coding units (CUs) into Prediction Units (PUs). The transform tree is generated according to the split-flags. Nodes in the transform tree represent transform units (TU) associated with the CUs. The generation splits each TU only if the corresponding split-flag is set. For each PU that includes multiple TUs, the multiple TUs are merged into a larger TU, and the transform tree is modified according to the splitting and merging. Then, data contained in each PU can be decoded using the TUs associated with the PU according to the transform tree.
摘要:
A method randomly accesses multiview videos. Multiview videos are acquired of a scene with corresponding cameras arranged at poses, such that there is view overlap between any pair of cameras. V-frames are generated from the multiview videos. The V-frames are encoded using only spatial prediction. Then, the V-frames are inserted periodically in an encoded bit stream to provide random temporal access to the multiview videos. Additional view dependency information enables the decoding of a reduced number of frames prior to accessing randomly a target frame for a specified view and time, and decoding the target frame.