Abstract:
Methods, systems, and apparatus, including computer programs, for compression and decompression of video data using an ensemble of machine learning models. Methods can include defining for each frame in a video, a plurality of blocks in the frame. Methods can further include processing the frames of video in sequential sets, wherein each sequential set is at least a current frame (220) of video and a prior frame (240) of video in the ordered sequence. Each respective prediction of a block in the frame of the video includes providing, as input to a prediction model a first and the second border (235,230) of a current block (225) of the current frame, a first and a second border (250, 255) for a respective current block (245) of the prior frame and the respective current block (245) of the prior frame.
Abstract:
A portable terminal including an input unit (1001), through which images are entered, an encoding unit (1002) operable to encode the input images, thereby providing encoded data, a transmitting unit (1003) operable to transmit the encoded data to a communication counterpart, a receiving unit (1006) operable to receive communication counterpart-related information from the communication counterpart, and a mode-determining unit operable to determine an encoding method. The mode-determining unit receives the information on each of the self-terminal and the communication counterpart, thereby providing the determined encoding method. As a result, even when each of the self-terminal and the communication counterpart retains a sufficient level of remaining battery power, the encoding method is properly changed to suppress unwanted power consumption, whereby communication between the self-terminal and the communication counterpart can be made for a longer period of time.
Abstract:
Methods and apparatus for video processing are disclosed. The processing may include video encoding, video decoding or video transcoding. One example method includes performing a conversion between a video comprising one or more video layers and a bitstream of the video according to a rule, the rule specifies use of multiple adaptation parameter set (APS) network abstraction layer (NAL) units for the video, each APS NAL unit has a corresponding adaptation parameter type value, each APS NAL unit is associated with a corresponding video layer identifier, each APS NAL unit is a prefix unit or a suffix unit, and the rule specifies that, responsive to the multiple APS NAL units sharing a same adaptation parameter type value, adaptation parameter set identifier values of the multiple APS NAL units belong to a same identifier space.
Abstract:
Aspects of the disclosure provide a method and an apparatus including processing circuitry for video decoding. The processing circuitry decodes, from a coded video bitstream, a first syntax element indicating whether a first component in the coded video bitstream is coded based on a second component in the coded video bitstream. The processing circuitry determines whether to decode one or more second syntax elements for a chroma related coding tool based on the first syntax element. The chroma related coding tool is a luma mapping with chroma scaling coding tool or a cross-component adaptive loop filter. The one or more second syntax elements are decoded when the first syntax element indicates that the first component is coded based on the second component. The one or more second syntax elements are not decoded when the first syntax element indicates that the first component is not coded based on the second component.
Abstract:
Presented herein are techniques for a low-complexity process of generating an artificial frame that can be used for prediction. At least a first reference frame and a second reference frame of a video signal are obtained. A synthetic reference frame is generated from the first reference frame and the second reference frame. Reference blocks from each of the first reference frame and the second reference frame are combined to derive an interpolated block of the synthetic reference frame.
Abstract:
From the transmission side, a background and objects (#1 to #3) both constituting an image are transmitted at a transmission rate R/4. On the receiving side, the image is displayed with a spatial resolution and a temporal resolution. When the object (#1) is dragged at time t1 on the receiving side, the transmission of the background and objects (#2 and #3) is stopped on the transmission side and only the object (#1) is transmitted at all the transmission rates R, as shown in Figure 16(A). Thus, on the receiving side, the image including the object (#1) being dragged is displayed with an improved spatial resolution for the object (#1) by sacrificing the temporal resolution of the image.
Abstract:
An image transmitting method, wherein compressed image data (Dv) obtained by compression-coding digital image data corresponding to one motion picture and including an identification flag (Hfd) representing whether or not the compressed image data (Dv) is appropriate for random independent reproduction of an arbitrary image and sent next to the synchronizing signal (Hsd) at the first portion of the header (Hv). On the reproducing side where the compressed image data transmitted by such an image transmitting method is reproduced, during the analysis of the header (Hv) given to the compressed image data (Dv) corresponding to one motion picture, the appropriateness of the random independent reproduction of the compressed image data (Dv) is sensed in a short time by analyzing the identification flag (Hfd).
Abstract:
A source quality of a source video and a source content complexity of the source video are identified. Parameter constraints with respect to parameters of an operation are received. The source video quality, source content complexity, and parameter constraints are applied to a deep neural network (DNN) producing DNN outputs. In an example, the DNN outputs are combined using domain knowledge to provide the filter parameters, as predicted, to a filter chain, such that applying the filter chain to the input source video results in an output video achieving the full reference video quality score. In another example, the DNN outputs are combined using domain knowledge to provide the filter parameters, as predicted, to a filter chain, such that applying the filter chain to the input source video results in an output video achieving the full reference video quality score.
Abstract:
A computerized method and system of segment-based video encoding optimization, the method comprising obtaining a set of input segments constituting an input video stream; selecting one or more input segments in accordance with a selection criterion to be one or more representative segments; determining an encoding instruction and obtaining one or more encoded representative segments encoded from the representative segments in accordance with the encoding instruction and using a quality-driven video encoding scheme; determining, based on the encoded representative segments, an encoding configuration, and obtaining one or more encoded input segments encoded from at least a sub-set of the input segments using the encoding configuration; determining an evaluation instruction used to evaluate quality for each encoded input segment; and generating an output video stream, by selecting, for each input segment, a corresponding output segment from a group comprising at least one of the following candidate segments: the input segment, an encoded representative segment, and an encoded input segment.
Abstract translation:一种基于片段的视频编码优化的计算机化的方法和系统,所述方法包括:获得构成输入视频流的一组输入段; 根据选择标准选择一个或多个输入段作为一个或多个代表性段; 确定编码指令并根据编码指令并使用质量驱动的视频编码方案获得从代表段编码的一个或多个编码的代表性段; 基于编码的代表性段确定编码配置;以及使用编码配置获得从至少一个输入段的子集编码的一个或多个编码的输入段; 确定用于评估每个编码输入段的质量的评估指令; 以及通过为每个输入段选择来自包括以下候选段中的至少一个的组的相应输出段:输入段,编码的代表段和编码的输入段,来生成输出视频流。 p >