Abstract:
Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.
Abstract:
A low power video hardware engine is disclosed. The video hardware engine includes a video hardware accelerator unit. A shared memory is coupled to the video hardware accelerator unit, and a scrambler is coupled to the shared memory. A vDMA (video direct memory access) engine is coupled to the scrambler, and an external memory is coupled to the vDMA engine. The scrambler receives an LCU (largest coding unit) from the vDMA engine. The LCU comprises N×N pixels, and the scrambler scrambles N×N pixels in the LCU to generate a plurality of blocks with M×M pixels. N and M are integers and M is less than N.
Abstract:
This invention deals with the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
An electronic tracing process includes packing both stall (215) and reason (219) data into a single high priority timing information stream. An integrated circuit includes an electronic processor (110), and a tracing circuit (120) operable to pack both stall and events data into a single timing information stream. Other circuits, processes and systems are also disclosed.
Abstract:
An electronic tracing process includes packing both stall (215) and reason (219) data into a single high priority timing information stream. An integrated circuit includes an electronic processor (110), and a tracing circuit (120) operable to pack both stall and events data into a single timing information stream. Other circuits, processes and systems are also disclosed.
Abstract:
A method is disclosed for efficiently calculating a median value of a high-order array in a Single Instruction Multiple Data (SIMD) processor. Values of the high-order array are sorted vertically in each column followed by sorts on each individual row. After the sort, selective diagonal values of the sorted high-order array are used to form a low-order array to calculate the median of the high-order array. The median calculation using selective diagonal values of the high-order array in a low-order array significantly improves SIMD processor efficiency and throughput.
Abstract:
An electronic tracing process includes packing both stall (215) and reason (219) data into a single high priority timing information stream. An integrated circuit includes an electronic processor (110), and a tracing circuit (120) operable to pack both stall and events data into a single timing information stream. Other circuits, processes and systems are also disclosed.
Abstract:
This invention enables effective corner detection of pixels of an image using the FAST algorithm using a vector SIMD processor. This invention loads an 8×8 pixel block that includes four 7×7 pixel blocks including the 16 peripheral pixels to be tested for each of four center pixels. This invention rearranges the 64 pixels of the 8×8 block to form a 16 element array for each center pixel preferably using a vector permutation instruction. This invention uses vector SIMD subtraction and compare and vector SIMD addition and compare to make the FAST algorithm comparisons. The N consecutive pixels determinations of the FAST algorithm are made from the results of plural shift and AND operations. The corresponding center pixel is marked a corner or not a corner dependent upon of the results of plural shift and AND operations.
Abstract:
This invention enables effective corner detection of pixels of an image using the FAST algorithm using a vector SIMD processor. This invention loads an 8×8 pixel block that includes four 7×7 pixel blocks including the 16 peripheral pixels to be tested for each of four center pixels. This invention rearranges the 64 pixels of the 8×8 block to form a 16 element array for each center pixel preferably using a vector permutation instruction. This invention uses vector SIMD subtraction and compare and vector SIMD addition and compare to make the FAST algorithm comparisons. The N consecutive pixels determinations of the FAST algorithm are made from the results of plural shift and AND operations. The corresponding center pixel is marked a corner or not a corner dependent upon of the results of plural shift and AND operations.
Abstract:
An electronic tracing process includes packing both stall (215) and reason (219) data into a single high priority timing information stream. An integrated circuit includes an electronic processor (110), and a tracing circuit (120) operable to pack both stall and events data into a single timing information stream. Other circuits, processes and systems are also disclosed.