Abstract:
Embodiments of the present invention generate estimates of device motion from two data sources on a computing device—a motion sensor and a camera. The device may compare the estimates to each other to determine if they agree. If they agree, the device may confirm that device motion estimates based on the motion sensor are accurate and may output those estimates to an application within the device. If the device motion estimates disagree, the device may alter the motion estimates obtained from the motion sensor before outputting them to the application.
Abstract:
The invention is directed to an efficient way for encoding and decoding video. Embodiments include identifying different coding units that share a similar characteristic. The characteristic can be, for example: quantization values, modes, block sizes, color space, motion vectors, depth, facial and non-facial regions, and filter values. An encoder may then group the units together as a coherence group. An encoder may similarly create a table or other data structure of the coding units. An encoder may then extract the commonly repeating characteristic or attribute from the coding units. The encoder may transmit the coherence groups along with the data structure, and other coding units which were not part of a coherence group. The decoder may receive the data, and utilize the shared characteristic by storing locally in cache, for faster repeated decoding, and decode the coherence group together.
Abstract:
Coding techniques for input video may include assigning picture identifiers to input frames in either long-form or short-form formats. If a network error has occurred that results in loss of previously-coded video data, a new input frame may be assigned a picture identifier that is coded in a long-form coding format. If no network error has occurred, the input frame may be assigned a picture identifier that is coded in a short-form coding format. Long-form coding may mitigate against loss of synchronization between an encoder and a decoder by picture identifiers.
Abstract:
Disclosed is a system and method of controlling a video decoder, including a reviewing channel data representing coded video data generated by an encoder to identify parameters of a hypothetical reference decoder (HRD) used by the encoder during coding operations. A parameter representing an exit data rate requirement of a coded picture buffer (CPB) of the HRD is compared against exit rate performance of the video decoder. If the exit rate performance of the video coder matches the exit rate requirement of the HRD, the coded video data is decoded, otherwise, a certain decoding degradation scheme can be applied, including disabling decoder from decoding the coded video data.
Abstract:
In video conferencing over a radio network, the radio equipment is a major power consumer especially in cellular networks such as LTE. In order to reduce the radio power consumption in video conferencing, it is important to introduce an enough radio inactive time. Several types of data buffering and bundling can be employed within a reasonable range of latency that doesn't significantly disrupt the real-time nature of video conferencing. In addition, the data transmission can be synchronized to the data reception in a controlled manner, which can result in an even longer radio inactive time and thus take advantage of radio power saving modes such as LTE C-DRX.
Abstract:
Embodiments of the invention provide techniques for upsampling a video sequence for coding. According to the method, an estimate of camera motion may be obtained from motion sensor data. Video data may be analyzed to detect motion within frames output from a camera that is not induced by the camera motion. When non-camera motion falls within a predetermined operational limit, video upsampling processes may be engaged. In another embodiment, video upsampling may be performed by twice estimating image content for a hypothetical new a frame using two different sources as inputs. A determination may be made whether the two estimates of the frame match each other sufficiently well. If so, the two estimates may be merged to yield a final estimated frame and the new frame may be integrated into a stream of video data.
Abstract:
A system and method is presented to mask artifacts with content-adaptive comfort noise. Encoder side analysis may determine initial comfort noise characteristics. Noise parameters may then be developed for each frame or sequence of frames that define comfort noise patches that mask the artifacts. At the decoder, a comfort noise patch can be fetched from memory or created based on the amplitude and spatial characteristics of the comfort noise specified in the noise parameters. The noise patch may additionally be scaled or otherwise adjusted to accommodate the capabilities and/or limitations of the specific decoder.
Abstract:
Systems, methods, and a computer readable medium for performing auto exposure (AE) techniques that are beneficial in variable lighting conditions—and particularly applicable to handheld and/or mobile videoconferencing applications—are disclosed herein. Handheld and/or mobile videoconferencing applications—unlike their fixed camera counterparts—are often exposed to a wide variety of rapidly changing lighting and scene conditions, and thus face a difficult trade-off between adjusting exposure parameter values too frequently or not frequently enough. In personal electronic devices executing such handheld and/or mobile videoconferencing applications, it may be desirable to: use a small, centered, and center-weighted exposure metering region; set a relatively low brightness target value; and adjust the camera's exposure parameter values according to a distance-dependent convergence speed function. The use of such techniques, in conjunction with a relatively large stability region, may also improve the quality of a video encoder's temporal predictions—and thus video quality—in videoconferencing applications.
Abstract:
Techniques are disclosed for deriving prediction pixel blocks for use in intra-coding video and combined inter- and intra-coding video. In a first aspect, the techniques may include deriving value(s) for pixel location(s) of the prediction pixel block by, when a prediction direction vector assigned to the prediction vector points to quadrants I or III of a Cartesian plane, deriving the pixel location's value from pixel values in two regions of previously-decoded pixel data intercepted by extending the prediction direction vector in two opposite directions through the pixel location. When the prediction direction vector points toward quadrants II of the Cartesian plane, deriving the pixel location's value from pixel values in one region intercepted by the prediction direction vector through the pixel location, and from a second region intercepted by a vector that is orthogonal to the prediction direction vector.
Abstract:
A system obtains a data set representing immersive video content for display at a display time, including first data representing the content according to a first level of detail, and second data representing the content according to a second higher level of detail. During one or more first times prior to the display time, the system causes at least a portion of the first data to be stored in a buffer. During one or more second times prior to the display time, the system generates a prediction of a viewport for displaying the content to a user at the display time, identifies a portion of the second data corresponding to the prediction of the viewport, and causes the identified portion of the second data to be stored in the video buffer. At the display time, the system causes the content to be displayed to the user using the video buffer.