Abstract:
A vide coding server may code a common video sequence into a plurality of coded data streams, each coded data stream representing the video sequence coded using coding parameters tailored for a respective transmission bit rate. The coding may cause a set of transmission units from among the coded data streams to include coded video data from a common point of the video sequence, and a first coded frame of each transmission unit of the set to be a synchronization frame. A manifest file may be built representing an index of transmission units of the respective coded data streams. The coded data streams and manifest file may be stored by the server for delivery to a client device. During download and decode, the chunks may be decoded efficiently even when switching among streams because the first frame in each chunk is a synchronization frame.
Abstract:
Video coding systems and methods protect against banding artifacts in decoded image content. According to the method, a video coder may identify, from content of pixel blocks of a frame of video data, which pixel blocks are likely to exhibit banding artifacts from the video coding/decoding processes. The video coder may assemble regions of the frame that are likely to exhibit banding artifacts based on the identified pixel blocks' locations with respect to each other. The video coder may apply anti-banding processing to pixel blocks within one or more of the identified regions and, thereafter, may code the processed frame by a compression operation.
Abstract:
A multi-layer low-pass filter is used to filter a first frame of video data representing at least a portion of an environment of an individual. A first layer of the filter has a first filtering resolution setting for a first subset of the first frame, while a second layer of the filter has a second filtering resolution setting for a second subset. The first subset includes a data element positioned along a direction of a gaze of the individual, and the second subset of the frame surrounds the first subset. A result of the filtering is compressed and transmitted via a network to a video processing engine configured to generate a modified visual representation of the environment.
Abstract:
In one implementation, a method includes receiving a warped image representing simulated reality (SR) content (e.g., to be displayed in a display space), the warped image having a plurality of pixels at respective locations uniformly spaced in a grid pattern in a warped space, wherein the plurality of pixels are respectively associated with a plurality of respective pixel values and a plurality of respective scaling factors indicating a plurality of respective resolutions at a plurality of respective locations of the SR content (e.g., in the display space). The method includes processing the warped image in the warped space based on the plurality of respective scaling factors to generate a processed warped image and transmitting the processed warped image.
Abstract:
A video encoding system in which pixel data is decomposed into frequency bands prior to encoding. The frequency bands for a slice of a frame may be buffered so that complexity statistics may be calculated across the frequency bands prior to encoding. The statistics may then be used by a rate control component in determining quantization parameters for the frequency bands for modulating the rate in the encoder for the current slice. The quantization parameters for the frequency bands may be calculated jointly to optimize the quality of the displayed frames after decoder reconstruction and wavelet synthesis on a receiving device. Information about one or more previously processed frames may be used in combination with the statistics for a current slice in determining the quantization parameters for the current slice.
Abstract:
Video processing techniques and pipelines that support capture, distribution, and display of high dynamic range (HDR) image data to both HDR-enabled display devices and display devices that do not support HDR imaging. A sensor pipeline may generate standard dynamic range (SDR) data from HDR data captured by a sensor using tone mapping, for example local tone mapping. Information used to generate the SDR data may be provided to a display pipeline as metadata with the generated SDR data. If a target display does not support HDR imaging, the SDR data may be directly rendered by the display pipeline. If the target display does support HDR imaging, then an inverse mapping technique may be applied to the SDR data according to the metadata to render HDR data for display. Information used in performing color gamut mapping may also be provided in the metadata and used to recover clipped colors for display.
Abstract:
A device comprises memory, a display characterized by a display characteristic, and processors coupled to the memory. The processors execute instructions causing the processors to receive data indicative of the display characteristic, data indicative of ambient lighting, and data indicative of content characteristics for a content item; determine a tone mapping curve for the content item based on the data indicative of content characteristics; determine a first, so-called “anchor” point along the tone mapping curve; modify a first portion of the tone mapping curve below the anchor point based on the data indicative of ambient lighting; modify a second portion of the tone mapping curve above the anchor point based on the data indicative of the display characteristic; perform tone mapping for the content item based on the modified toned mapping curve to obtain a tone mapped content item; and cause the display to display the tone mapped content item.
Abstract:
Methods and systems provide an adaptive quantization parameter (QP) modulation scheme for video coding and compression that is sensitive to user visual perception. In an embodiment, the method includes detecting an eye sensitive region, where a region is considered sensitive based on a noticeability of a visual effect. The method includes estimating encoding parameters for image content in the detected eye sensitive region. The method further includes encoding the detected eye sensitive region with the estimated encoding parameters. The estimating the encoding parameters may be based on, among other things, a variance, a motion vector, a DC value, an edge value, and external information such as a user command or screen content. The encoding may include storing an average or maximum sum of square differences (SSD) value for a detected eye sensitive area and adjusting a QP value based on a comparison of the SSD value to generated threshold values.
Abstract:
A mixed reality system that includes a device and a base station that communicate via a wireless connection The device may include sensors that collect information about the user's environment and about the user. The information collected by the sensors may be transmitted to the base station via the wireless connection. The base station renders frames or slices based at least in part on the sensor information received from the device, encodes the frames or slices, and transmits the compressed frames or slices to the device for decoding and display. The base station may provide more computing power than conventional stand-alone systems, and the wireless connection does not tether the device to the base station as in conventional tethered systems. The system may implement methods and apparatus to maintain a target frame rate through the wireless link and to minimize latency in frame rendering, transmittal, and display.
Abstract:
A multi-layer low-pass filter is used to filter a first frame of video data representing at least a portion of an environment of an individual. A first layer of the filter has a first filtering resolution setting for a first subset of the first frame, while a second layer of the filter has a second filtering resolution setting for a second subset. The first subset includes a data element positioned along a direction of a gaze of the individual, and the second subset of the frame surrounds the first subset. A result of the filtering is compressed and transmitted via a network to a video processing engine configured to generate a modified visual representation of the environment.