摘要:
Two pairs of deblock instructions for performing deblock filtering on a horizontal row of pixels according to the H.264 (MPEG 4 part 10) and VC1 video codec algorithms. The first instruction of each pair has three 128-bit operands comprising the 16-bit components of a horizontal line of 8 pixels crossing a vertical block edge between pixels 4 and 5 in a YUV image, a series of filter threshold parameters, and a 128-bit destination operand for storing the output of the first instruction. The second instruction of each pair accepts the same 16-bit components as its first input, the output of the first instruction as its second input and a destination operand for storing an output of the second instruction as its third input. The instruction pairs are intended for use with the H.264 or VC1 video codecs respectively.
摘要:
Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline. A record instruction including a record start address is sent to the extended pipeline. The extended pipeline thus begins recording the subsequent instruction sequence at the specified address until an end record instruction is encountered. The end record instruction is recorded as the last instruction in the sequence. The main pipeline may then call the instruction sequence by sending a run instruction including the start address for the desired sequence to the extended pipeline. This run instruction causes the extended pipeline to begin autonomously executing the recorded sequence until the end record instruction is encountered. This instruction causes the extended pipeline to cease autonomous execution and to return to executing instructions supplied by the main pipeline.
摘要:
Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline. A record instruction including a record start address is sent to the extended pipeline. The extended pipeline thus begins recording the subsequent instruction sequence at the specified address until an end record instruction is encountered. The end record instruction is recorded as the last instruction in the sequence. The main pipeline may then call the instruction sequence by sending a run instruction including the start address for the desired sequence to the extended pipeline. This run instruction causes the extended pipeline to begin autonomously executing the recorded sequence until the end record instruction is encountered. This instruction causes the extended pipeline to cease autonomous execution and to return to executing instructions supplied by the main pipeline.
摘要:
A parameterizable clip instruction for SIMD microprocessor architecture and method of performing a clip operating the same. A single instruction is provided with three input operands: a destination address, a source address and a controlling parameter. The controlling parameter includes a range type and a range specifier. The range type is a multi-bit integer in the operand that is used to index a table of range types. The range specifier plugs into the range type to define a range. The data input at the source address is clipped according to the controlling parameters. The instruction is particularly suited to video encoding/decoding applications where interpolations or other calculations, lies outside the maximum value and that final result will have to be clipped to saturation value, for example, the maximum pixel value. Signed and unsigned clipping ranges may be used that are not only powers of two.
摘要:
Systems and methods for selectively decoupling a parallel extended processor pipeline. A main processor pipeline and parallel extended pipeline are coupled via an instruction queue. The main pipeline can instruct the parallel pipeline to execute instructions directly or to begin fetching and executing its own instructions autonomously. During autonomous operation of the parallel pipeline, instructions from the main pipeline accumulate in the instruction queue. The parallel pipeline can return to main pipeline controlled execution through a single instruction. A light weight mechanism in the form of a condition code as seen by the main processor is designed to allow intelligent decision maximizing overall performance to be made in run-time if further instructions should be issued to the parallel extended pipeline based on the queue status.
摘要:
Two pairs of deblock instructions for performing deblock filtering on a horizontal row of pixels according to the H.264 (MPEG 4 part 10) and VC1 video codec algorithms. The first instruction of each pair has three 128-bit operands comprising the 16-bit components of a horizontal line of 8 pixels crossing a vertical block edge between pixels 4 and 5 in a YUV image, a series of filter threshold parameters, and a 128-bit destination operand for storing the output of the first instruction. The second instruction of each pair accepts the same 16-bit components as its first input, the output of the first instruction as its second input and a destination operand for storing an output of the second instruction as its third input. The instruction pairs are intended for use with the H.264 or VC1 video codecs respectively.
摘要:
A data path for a SIMD-based microprocessor is used to perform different simultaneous filter sub-operations in parallel data lanes of the SIMD-based microprocessor. Filter operations for sub-pixel interpolation are performed simultaneously on separate lanes of the SIMD processor's data path. Using a dedicated internal data path, precision higher than the native precision of the SIMD unit may be achieved. Through the data path according to this invention, a single instruction may be used to generate the value of two adjacent sub-pixels located diagonally with respect to integer pixel positions.
摘要:
A data path for a SIMD-based microprocessor is used to perform different simultaneous filter sub-operations in parallel data lanes of the SIMD-based microprocessor. Filter operations for sub-pixel interpolation are performed simultaneously on separate lanes of the SIMD processor's data path. Using a dedicated internal data path, precision higher than the native precision of the SIMD unit may be achieved. Through the data path according to this invention, a single instruction may be used to generate the value of two adjacent sub-pixels located diagonally with respect to integer pixel positions.
摘要:
Systems and methods for synchronizing multiple processing engines of a microprocessor. In a microprocessor engine employing processor extension logic, DMA engines are used to permit the processor extension logic to move data into and out of local memory independent of the main instruction pipeline. Synchronization between the extended instruction pipeline and DMA engines is performed to maximize simultaneous operation of these elements. The DMA engines includes a data-in and data-out engine each adapted to buffer at least one instruction in a queue. If, for each DMA engine, the queue is full and a new instruction is trying to enter the buffer, the DMA engine will cause the extended pipeline to pause execution until the current DMA operation is complete. This prevents data overwrites while maximizing simultaneous operation.
摘要:
Methods and apparatus adapted for enhancing the throughput of a digital processor (e.g., microprocessor, CISC device, or RISC device) through use of a direct memory access (DMA) mechanism. In one embodiment, the processor comprises a “soft” RISC-based processor core that is both user-extensible and user-configurable. The core comprises a functional process or unit (DMA assist) that is coupled to the processor's extension logic and which facilitates throughput by, among other things, ensuring that the CPU and processor extension logic can operate on data in parallel in an efficient manner. In one variant, a parallel datapath (including a buffer) is used in conjunction with the aforementioned DMA assist so as to permit the processor extension logic to efficiently operate in parallel with the CPU.