摘要:
A scalable layered video coding scheme that encodes video data frames into multiple layers, including a base layer of comparatively low quality video and multiple enhancement layers of increasingly higher quality video, adds error resilience to the enhancement layer. Unique resynchronization marks are inserted into the enhancement layer bitstream in headers associated with each video packet, headers associated with each bit plane, and headers associated with each video-of-plane (VOP) segment. Following transmission of the enhancement layer bitstream, the decoder tries to detect errors in the packets. Upon detection, the decoder seeks forward in the bitstream for the next known resynchronization mark. Once this mark is found, the decoder is able to begin decoding the next video packet. With the addition of many resynchronization marks within each frame, the decoder can recover very quickly and with minimal data loss in the event of a packet loss or channel error in the received enhancement layer i bitstream. The video coding scheme also facilitates redundant encoding of header information from the higher-level VOP header down into lower level bit plane headers and video packet headers. Header extension codes are added to the bit plane and video packet headers to identify whether the redundant data is included.
摘要:
A scalable layered video coding scheme that encodes video data frames into multiple layers, including a base layer of comparatively low quality video and multiple enhancement layers of increasingly higher quality video, adds error resilience to the enhancement layer. Unique resynchronization marks are inserted into the enhancement layer bitstream in headers associated with each video packet, headers associated with each bit plane, and headers associated with each video-of-plane (VOP) segment. Following transmission of the enhancement layer bitstream, the decoder tries to detect errors in the packets. Upon detection, the decoder seeks forward in the bitstream for the next known resynchronization mark. Once this mark is found, the decoder is able to begin decoding the next video packet. With the addition of many resynchronization marks within each frame, the decoder can recover very quickly and with minimal data loss in the event of a packet loss or channel error in the received enhancement layer bitstream. The video coding scheme also facilitates redundant encoding of header information from the higher-level VOP header down into lower level bit plane headers and video packet headers. Header extension codes are added to the bit plane and video packet headers to identify whether the redundant data is included.
摘要:
A video encoding system and method utilizes a three-dimensional (3-D) wavelet transform and entropy coding that utilize motion information in a way to reduce the sensitivity to motion. In one implementation, the coding process initially estimates motion trajectories of pixels in a video object from frame to frame in a video sequence to account for motion of the video object throughout the frames. After motion estimation, a 3-D wavelet transform is applied in two parts. First, a temporal 1-D wavelet transform is applied to the corresponding pixels along the motion trajectories in a time direction. The temporal wavelet transform produces decomposed frames of temporal wavelet transforms, where the spatial correlation within each frame is well preserved. Second, a spatial 2-D wavelet transform is applied to all frames containing the temporal wavelet coefficients. The wavelet transforms produce coefficients within different sub-bands. The process then codes wavelet coefficients. In particular, the coefficients are assigned various contexts based on the significance of neighboring samples in previous, current, and next frame, thereby taking advantage of any motion information between frames. The wavelet coefficients are coded independently for each sub-band to permit easy separation at a decoder, making resolution scalability and temporal scalability natural and easy. During the coding, bits are allocated among sub-bands according to a technique that optimizes rate-distortion characteristics.
摘要:
A video encoding scheme employs progressive fine-granularity layered coding to encode video data frames into multiple layers, including a base layer of comparatively low quality video and multiple enhancement layers of increasingly higher quality video. Some of the enhancement layers in a current frame are predicted from at least one same or lower quality layer in a reference frame, whereby the lower quality layer is not necessarily the base layer. Use of multiple reference layers of different quality results in occasional fluctuations in the encoded image data. The video encoding scheme efficiently eliminates such fluctuations by predicting higher quality data from the lower quality data encoded in the base layer and a low quality enhancement layer.
摘要:
A method and apparatus for selecting a quantizer scale for each macroblock within a frame to optimize the coding rate is presented. A quantizer scale is selected for each macroblock within each frame such that the target bit rate for the frame is achieved while maintaining a uniform visual quality over the entire frame.
摘要:
An image distribution system has a source that encodes digital images and transmits them over an error-prone channel to a destination. The source has an image coder that processes the digital images using vector transformation followed by vector quantization. This produces groups of vectors and quantized values that are representative of the images. The image coder orders the vectors in the codebooks and assigns vector indexes to the vectors such that a bit error occurring at a less significant bit in a vector index results in less distortion than a bit error occurring at a more significant bit. Depending upon the format and the capabilities of the source and destination, the image coder may allocate different numbers of bits to different groups of vectors according to a bit allocation map for this allocation process. The source also has a UEP (Unequal Error Protection) coder that layers the vector indexes according to their significance. Two possible approaches include frequency-based UEP and bit-plane based UEP. The source transmits a bitstream that includes the image values, a bit allocation map, and the layered vector indexes. The destination receives the bitstream and recovers the vectors using the vector indexes and bit allocation map. The destination then reconstructs the image from the image values and the vectors.
摘要:
A video encoding scheme employs progressive fine-granularity layered coding to encode video data frames into multiple layers, including a base layer of comparatively low quality video and multiple enhancement layers of increasingly higher quality video. Some of the enhancement layers in a current frame are predicted from at least one lower quality layer in a reference frame, whereby the lower quality layer is not necessarily the base layer.
摘要:
A three-dimensional (3D) shape-adaptive discrete wavelet transform (SA-DWT) is provided for efficient object-based video coding. In a first stage, a one-dimensional SA-DWT is performed along the temporal direction among pixels that have temporal correspondence. The correspondence can be established by motion estimation or other matching approaches. SA-DWT in the temporal direction is used to treat emerging pixels, terminating pixels or pixels that have colliding correspondence pixels. After the temporal SA-DWT transform, the resulting temporal wavelet coefficients are placed in the spatial positions corresponding to the original pixels to maintain the spatial correlation within each frame. Then, in a second stage, a two-dimensional SA-DWT is applied to the temporal SA-DWT coefficients within each frame. The 3D SA-DWT can handle arbitrarily shaped video objects while providing flexible spatial and temporal scalability as in any wavelet-based coding scheme. The 3D SA-DWT can also track the video object motion and perform the wavelet transform among corresponding pixels for that object while keeping the spatial correlation within a frame.
摘要翻译:提供三维(3D)形状自适应离散小波变换(SA-DWT),用于高效的基于对象的视频编码。 在第一阶段中,沿着具有时间对应关系的像素之间的时间方向执行一维SA-DWT。 通信可以通过运动估计或其他匹配方法建立。 时间方向的SA-DWT用于处理新出现的像素,终止具有碰撞对应像素的像素或像素。 在时间SA-DWT变换之后,将所得到的时间小波系数放置在与原始像素对应的空间位置中,以保持每帧内的空间相关性。 然后,在第二阶段中,将二维SA-DWT应用于每帧内的时间SA-DWT系数。 3D SA-DWT可以处理任意形状的视频对象,同时提供灵活的空间和时间可伸缩性,如在任何基于小波的编码方案中。 3D SA-DWT还可以跟踪视频对象运动,并在该对象的相应像素之间执行小波变换,同时保持帧内的空间相关性。
摘要:
A method and apparatus for selecting a quantizer scale for each macroblock to maintain the overall quality of the video image while optimizing the coding rate. A quantizer scale is selected for each macroblock such that target bit rate for the picture is achieved while an optimal quantization scale ratio is maintained for successive macroblocks to produce a uniform visual quality over the entire picture. One embodiment applies the method to the frame level while another embodiment applies the method in conjunction with a wavelet transform.
摘要:
A method and apparatus for determining an optimal quadtree structure for quadtree-based variable block size (VBS) motion estimation. The method computes the motion vectors for the entire quadtree from the largest block-size to the smallest block-size. Next, the method may optionally select an optimal quantizer scale for each block. The method then compares from "bottom-up" the sum of the distortion from encoding all sub-blocks or sub-nodes (children) as compared to the distortion from encoding the block or node (parent) from which the subnodes are partitioned from. If the sum of the distortion from encoding the children is greater than that of the parent then the node is "merged". Conversely, if the sum of the distortion from encoding the children is less than that of the parent then the node is "split" and the Lagrangian cost for the parent node is set as the sum of the Lagrangian cost of its children. This step is repeated for the all nodes through every level until an optimal quadtree structure is obtained.