Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
A video hardware engine which support dynamic frame padding is disclosed. The video hardware engine includes an external memory. The external memory stores a reference frame. The reference frame includes a plurality of reference pixels. A motion estimation (ME) engine receives a current LCU (largest coding unit), and defines a search area around the current LCU for motion estimation. The ME engine receives a set of reference pixels corresponding to the current LCU. The set of reference pixels of the plurality of reference pixels are received from the external memory. The ME engine pads a set of duplicate pixels along an edge of the reference frame when a part area of the search area is outside the reference frame.
Abstract:
A control processor for a video encode-decode engine is provided that includes an instruction pipeline. The instruction pipeline includes an instruction fetch stage coupled to an instruction memory to fetch instructions, an instruction decoding stage coupled to the instruction fetch stage to receive the fetched instructions, and an execution stage coupled to the instruction decoding stage to receive and execute decoded instructions. The instruction decoding stage and the instruction execution stage are configured to decode and execute a set of instructions in an instruction set of the control processor that are designed specifically for accelerating video sequence encoding and encoded video bit stream decoding.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
A video hardware engine which support dynamic frame padding is disclosed. The video hardware engine includes an external memory. The external memory stores a reference frame. The reference frame includes a plurality of reference pixels. A motion estimation (ME) engine receives a current LCU (largest coding unit), and defines a search area around the current LCU for motion estimation. The ME engine receives a set of reference pixels corresponding to the current LCU. The set of reference pixels of the plurality of reference pixels are received from the external memory. The ME engine pads a set of duplicate pixels along an edge of the reference frame when a part area of the search area is outside the reference frame.
Abstract:
A method is disclosed for efficiently calculating a median value of a high-order array in a Single Instruction Multiple Data (SIMD) processor. Values of the high-order array are sorted vertically in each column followed by sorts on each individual row. After the sort, selective diagonal values of the sorted high-order array are used to form a low-order array to calculate the median of the high-order array. The median calculation using selective diagonal values of the high-order array in a low-order array significantly improves SIMD processor efficiency and throughput.
Abstract:
This invention forms a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.
Abstract:
A video hardware engine which support dynamic frame padding is disclosed. The video hardware engine includes an external memory. The external memory stores a reference frame. The reference frame includes a plurality of reference pixels. A motion estimation (ME) engine receives a current LCU (largest coding unit), and defines a search area around the current LCU for motion estimation. The ME engine receives a set of reference pixels corresponding to the current LCU. The set of reference pixels of the plurality of reference pixels are received from the external memory. The ME engine pads a set of duplicate pixels along an edge of the reference frame when a part area of the search area is outside the reference frame.