Abstract:
Frame packing techniques are disclosed for multi-directional images and video. According to an embodiment, a multi-directional source image is reformatted into a format in which image data from opposing fields of view are represented in respective regions of the packed image as flat image content. Image data from a multi-directional field of view of the source image between the opposing fields of view are represented in another region of the packed image as equirectangular image content. It is expected that use of the formatted frame will lead to coding efficiencies when the formatted image is processed by predictive video coding techniques and the like.
Abstract:
Techniques are disclosed for coding and decoding video data using object recognition and object modeling as a basis of coding and error recovery. A video decoder may decode coded video data received from a channel. The video decoder may perform object recognition on decoded video data obtained therefrom, and, when an object is recognized in the decoded video data, the video decoder may generate a model representing the recognized object. It may store data representing the model locally. The video decoder may communicate the model data to an encoder, which may form a basis of error mitigation and recovery. The video decoder also may monitor deviation patterns in the object model and associated patterns in audio content; if/when video decoding is suspended due to operational errors, the video decoder may generate simulated video data by analyzing audio data received during the suspension period and developing video data from the data model and deviation(s) associated with patterns detected from the audio data.
Abstract:
Frame packing techniques are disclosed for multi-directional images and video. According to an embodiment, a multi-directional source image is reformatted into a format in which image data from opposing fields of view are represented in respective regions of the packed image as flat image content. Image data from a multi-directional field of view of the source image between the opposing fields of view are represented in another region of the packed image as equirectangular image content. It is expected that use of the formatted frame will lead to coding efficiencies when the formatted image is processed by predictive video coding techniques and the like.
Abstract:
Techniques are disclosed for coding and decoding video captured as cube map images. According to these techniques, padded reference images are generated for use during predicting input data. A reference image is stored in a cube map format. A padded reference image is generated from the reference image in which image data of a first view contained in reference image is replicated and placed adjacent to a second view contained in the cube map image. When coding a pixel block of an input image, a prediction search may be performed between the input pixel block and content of the padded reference image. When the prediction search identifies a match, the pixel block may be coded with respect to matching data from the padded reference image. Presence of replicated data in the padded reference image is expected to increase the likelihood that adequate prediction matches will be identified for input pixel block data, which will increase overall efficiency of the video coding.
Abstract:
Video coding techniques are disclosed that can accommodate low bandwidth events and preserve visual quality, at least in areas of an image that have high significance to a viewer. Region(s) of interest may be identified from content of input frame that will be coded. Two representations of the input frame may be generated at different resolutions. A low resolution representation of the input frame may be coded according to predictive coding techniques in which a portion outside the region of interest is coded at higher quality than a portion inside the region of interest. A high resolution representation of the input frame may be coded according to predictive coding techniques in which a portion inside the region of interest is coded at higher quality than a portion outside the region of interest. Doing so preserves visual quality, at least in areas of the input image that correspond to the region of interest.
Abstract:
Chroma deblock filtering of reconstructed video samples may be performed to remove blockiness artifacts and reduce color artifacts without over-smoothing. In a first method, chroma deblocking may be performed for boundary samples of a smallest transform size, regardless of partitions and coding modes. In a second method, chroma deblocking may be performed when a boundary strength is greater than 0. In a third method, chroma deblocking may be performed regardless of boundary strengths. In a fourth method, the type of chroma deblocking to be performed may be signaled in a slice header by a flag. Furthermore, luma deblock filtering techniques may be applied to chroma deblock filtering.
Abstract:
An encoding system may include a video source that provides video data to be coded, a video coder, a transmitter, and a controller to manage operation of the system. The controller may control the video coder to code and compress the image information from the video source into video data, based upon one or more motion prediction parameters. The transmitter may transmit the video data. A decoding system may decode the video data based upon the motion prediction parameters.
Abstract:
The invention is directed to an efficient way for encoding and decoding video. Embodiments include identifying different coding units that share a similar characteristic. The characteristic can be, for example: quantization values, modes, block sizes, color space, motion vectors, depth, facial and non-facial regions, and filter values. An encoder may then group the units together as a coherence group. An encoder may similarly create a table or other data structure of the coding units. An encoder may then extract the commonly repeating characteristic or attribute from the coding units. The encoder may transmit the coherence groups along with the data structure, and other coding units which were not part of a coherence group. The decoder may receive the data, and utilize the shared characteristic by storing locally in cache, for faster repeated decoding, and decode the coherence group together.
Abstract:
Embodiments of the present disclosure provide systems and methods for perspective shifting in a video conferencing session. In one exemplary method, a video stream may be generated. A foreground element may be identified in a frame of the video stream and distinguished from a background element of the frame. Data may be received representing a viewing condition at a terminal that will display the generated video stream. The frame of the video stream may be modified based on the received data to shift of the foreground element relative to the background element. The modified video stream may be displayed at the displaying terminal.
Abstract:
Methods and Systems disclosed to counteract spatial distortions introduced by imaging processes of multi-directional video frames, where objects may be projected to spherical or equirectangular representations. Techniques provided to invert the spatial distortions in video frames used as reference picture data in predictive coding, by spatially transforming the image content of the reference picture data before this image content is being used for the prediction of input video data in prediction-based coders and decoders.