Abstract:
Block processing pipeline methods and apparatus in which reference data are stored to a memory according to tile formats to reduce memory accesses when fetching the data from the memory. When the pipeline stores reference data from a current frame being processed to memory as a reference frame, the reference samples are stored in macroblock sequential order. Each macroblock sample set is stored as a tile. Reference data may be stored in tile formats for luma and chroma. Chroma reference data may be stored in tile formats for chroma 4:2:0, 4:2:2, and/or 4:4:4 formats. A stage of the pipeline may write luma and chroma reference data for macroblocks to memory according to one or more of the macroblock tile formats in a modified knight's order. The stage may delay writing the reference data from the macroblocks until the macroblocks have been fully processed by the pipeline.
Abstract:
A knight's order processing method for block processing pipelines in which the next block input to the pipeline is taken from the row below and one or more columns to the left in the frame. The knight's order method may provide spacing between adjacent blocks in the pipeline to facilitate feedback of data from a downstream stage to an upstream stage. The rows of blocks in the input frame may be divided into sets of rows that constrain the knight's order method to maintain locality of neighbor block data. Invalid blocks may be input to the pipeline at the left of the first set of rows and at the right of the last set of rows, and the sets of rows may be treated as if they are horizontally arranged rather than vertically arranged, to maintain continuity of the knight's order algorithm.
Abstract:
A block processing pipeline that includes a software pipeline and a hardware pipeline that run in parallel. The software pipeline runs at least one block ahead of the hardware pipeline. The stages of the pipeline may each include a hardware pipeline component that performs one or more operations on a current block at the stage. At least one stage of the pipeline may also include a software pipeline component that determines a configuration for the hardware component at the stage of the pipeline for processing a next block while the hardware component is processing the current block. The software pipeline component may determine the configuration according to information related to the next block obtained from an upstream stage of the pipeline. The software pipeline component may also obtain and use information related to a block that was previously processed at the stage.
Abstract:
Block processing pipeline methods and apparatus in which reference data are stored to a memory according to tile formats to reduce memory accesses when fetching the data from the memory. When the pipeline stores reference data from a current frame being processed to memory as a reference frame, the reference samples are stored in macroblock sequential order. Each macroblock sample set is stored as a tile. Reference data may be stored in tile formats for luma and chroma. Chroma reference data may be stored in tile formats for chroma 4:2:0, 4:2:2, and/or 4:4:4 formats. A stage of the pipeline may write luma and chroma reference data for macroblocks to memory according to one or more of the macroblock tile formats in a modified knight's order. The stage may delay writing the reference data from the macroblocks until the macroblocks have been fully processed by the pipeline.
Abstract:
In the video encoders described herein, blocks of pixels from a video frame may be encoded (e.g., using CAVLC encoding) in a block processing pipeline using wavefront ordering (e.g., in knight's order). Each of the encoded blocks may be written to a particular one of multiple DMA buffers such that the encoded blocks written to each of the buffers represent consecutive blocks of the video frame in scan order. A transcode pipeline may operate in parallel with (or at least overlapping) the operation of the block processing pipeline. The transcode pipeline may read encoded blocks from the buffers in scan order and merge them into a single bit stream (in scan order). A transcoder core of the transcode pipeline may decode the encoded blocks and encode them using a different encoding process (e.g., CABAC). In some cases, the transcoder may be bypassed.
Abstract:
Apparatuses and methods are disclosed for implementing an inter-processor communication channel including power-down functionality. In one embodiment, the apparatus may comprise a first integrated circuit (IC), a second IC coupled to the first IC via a communication interface, wherein the first IC is in one or more low power states and unable to monitor the communication interface. The apparatus may further comprise an inter-processor communication (IPC) channel coupled between the first and second ICs, wherein the IPC channel is separate from the communication interface and wherein the second IC generates at least one advisory signal to the first IC via the IPC channel.
Abstract:
In the video encoders described herein, blocks of pixels from a video frame may be encoded (e.g., using CAVLC encoding) in a block processing pipeline using wavefront ordering (e.g., in knight's order). Each of the encoded blocks may be written to a particular one of multiple DMA buffers such that the encoded blocks written to each of the buffers represent consecutive blocks of the video frame in scan order. A transcode pipeline may operate in parallel with (or at least overlapping) the operation of the block processing pipeline. The transcode pipeline may read encoded blocks from the buffers in scan order and merge them into a single bit stream (in scan order). A transcoder core of the transcode pipeline may decode the encoded blocks and encode them using a different encoding process (e.g., CABAC). In some cases, the transcoder may be bypassed.
Abstract:
Blocks of pixels from a video frame may be encoded in a block processing pipeline using wavefront ordering, e.g. according to knight's order. Each of the encoded blocks may be written to a particular one of multiple buffers such that the blocks written to each of the buffers represent consecutive blocks of the frame in scan order. Stitching information may be written to the buffers at the end of each row. A stitcher may read the rows from the buffers in order and generate a scan order output stream for the frame. The stitcher component may read the stitching information at the end of each row and apply the stitching information to one or more blocks at the beginning of a next row to stitch the next row to the previous row. Stitching may involve modifying pixel(s) of the blocks and/or modifying metadata for the blocks.
Abstract:
Blocks of pixels from a video frame may be encoded in a block processing pipeline using wavefront ordering, e.g. according to knight's order. Each of the encoded blocks may be written to a particular one of multiple buffers such that the blocks written to each of the buffers represent consecutive blocks of the frame in scan order. Stitching information may be written to the buffers at the end of each row. A stitcher may read the rows from the buffers in order and generate a scan order output stream for the frame. The stitcher component may read the stitching information at the end of each row and apply the stitching information to one or more blocks at the beginning of a next row to stitch the next row to the previous row. Stitching may involve modifying pixel(s) of the blocks and/or modifying metadata for the blocks.
Abstract:
Memory latency tolerance methods and apparatus for maintaining an overall level of performance in block processing pipelines that prefetch reference data into a search window. In a general memory latency tolerance method, search window processing in the pipeline may be monitored. If status of search window processing changes in a way that affects pipeline throughput, then pipeline processing may be modified. The modification may be performed according to no stall methods, stall recovery methods, and/or stall prevention methods. In no stall methods, a block may be processed using the data present in the search window without waiting for the missing reference data. In stall recovery methods, the pipeline is allowed to stall, and processing is modified for subsequent blocks to speed up the pipeline and catch up in throughput. In stall prevention methods, processing is adjusted in advance of the pipeline encountering a stall condition.