Abstract:
A vide coding server may code a common video sequence into a plurality of coded data streams, each coded data stream representing the video sequence coded using coding parameters tailored for a respective transmission bit rate. The coding may cause a set of transmission units from among the coded data streams to include coded video data from a common point of the video sequence, and a first coded frame of each transmission unit of the set to be a synchronization frame. A manifest file may be built representing an index of transmission units of the respective coded data streams. The coded data streams and manifest file may be stored by the server for delivery to a client device. During download and decode, the chunks may be decoded efficiently even when switching among streams because the first frame in each chunk is a synchronization frame.
Abstract:
Video coding systems and methods protect against banding artifacts in decoded image content. According to the method, a video coder may identify, from content of pixel blocks of a frame of video data, which pixel blocks are likely to exhibit banding artifacts from the video coding/decoding processes. The video coder may assemble regions of the frame that are likely to exhibit banding artifacts based on the identified pixel blocks' locations with respect to each other. The video coder may apply anti-banding processing to pixel blocks within one or more of the identified regions and, thereafter, may code the processed frame by a compression operation.
Abstract:
Systems and methods are configured for accessing data representing video content, the data comprising a set of one or more symbols each associated with a syntax element; performing a probability estimation, for encoding the data, comprising: for each symbol, obtaining, based on the syntax element for that symbol, an adaptivity rate parameter value, the adaptivity rate parameter value being a function of a number of symbols in the set of one or more symbols; updating the adaptivity rate parameter value as a function of an adjustment parameter value; and generating, based on the updated adaptivity rate parameter value, a probability value; generating a probability estimation; and encoding, based on the CDF of the probability estimation, the data comprising the set of one or more symbols for transmission.
Abstract:
A system obtains a data set representing immersive video content for display at a display time, including first data representing the content according to a first level of detail, and second data representing the content according to a second higher level of detail. During one or more first times prior to the display time, the system causes at least a portion of the first data to be stored in a buffer. During one or more second times prior to the display time, the system generates a prediction of a viewport for displaying the content to a user at the display time, identifies a portion of the second data corresponding to the prediction of the viewport, and causes the identified portion of the second data to be stored in the video buffer. At the display time, the system causes the content to be displayed to the user using the video buffer.
Abstract:
Video object and keypoint location detection techniques are presented. The system includes a detection system for generation locations of an object's keypoints along with probabilities associated with the locations, and a stability system for stabilizing keypoint locations of the detected objects. In some aspects, the generated probabilities are two-dimensional array correspond locations within input images, and stability system fits the generated probabilities to a two-dimensional probability distribution function.
Abstract:
Embodiments of the present disclosure provide systems and methods for perspective shifting in a video conferencing session. In one exemplary method, a video stream may be generated. A foreground element may be identified in a frame of the video stream and distinguished from a background element of the frame. Data may be received representing a viewing condition at a terminal that will display the generated video stream. The frame of the video stream may be modified based on the received data to shift of the foreground element relative to the background element. The modified video stream may be displayed at the displaying terminal.
Abstract:
In an example method, a system receives a plurality of frames of a video, and generates a data structure representing the video and representing a plurality of temporal layers. Generating the data structure includes: (i) determining a plurality of quality levels for presenting the video, where each of the quality levels corresponds to a different respective sampling period for sampling the frames of the video, (ii) assigning, based on the sampling periods, each of the frames to a respective one of the temporal layers of the data structure, and (iii) indicating, in the data structure, one or more relationships between (a) at least one the frames assigned to at least one of the temporal layers of the data structure, and (b) at least another one of the frames assigned to at least another one of the temporal layers of the data structure. Further, the system outputs the data structure.
Abstract:
In an example method, a system accesses first input data and a machine learning architecture. The machine learning architecture includes a first module having a first neural network, a second module having a second neural network, and a third module having a third neural network. The system generates a first feature set representing a first portion of the first input data using the first neural network, and a second feature set representing a second portion of the first input data using the second neural network. The system generates, using the third neural network, first output data based on the first feature set and the second feature set.
Abstract:
In an example method, a system receives a plurality of frames of a video, and generates a data structure representing the video and representing a plurality of temporal layers. Generating the data structure includes: (i) determining a plurality of quality levels for presenting the video, where each of the quality levels corresponds to a different respective sampling period for sampling the frames of the video, (ii) assigning, based on the sampling periods, each of the frames to a respective one of the temporal layers of the data structure, and (iii) indicating, in the data structure, one or more relationships between (a) at least one the frames assigned to at least one of the temporal layers of the data structure, and (b) at least another one of the frames assigned to at least another one of the temporal layers of the data structure. Further, the system outputs the data structure.
Abstract:
Techniques are disclosed for coding video data in which frames from a video source are partitioned into a plurality of tiles of common size, and the tiles are coded as a virtual video sequence according to motion-compensated prediction, each tile treated as having respective temporal location of the virtual video sequence. The coding scheme permits relative allocation of coding resources to tiles that are likely to have greater significance in a video coding session, which may lead to certain tiles that have low complexity or low motion content to be skipped during coding of the tiles for select source frames. Moreover, coding of the tiles may be ordered to achieve low coding latencies during a coding session.