Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
This invention enables effective corner detection of pixels of an image using the FAST algorithm using a vector SIMD processor. This invention loads an 8×8 pixel block that includes four 7×7 pixel blocks including the 16 peripheral pixels to be tested for each of four center pixels. This invention rearranges the 64 pixels of the 8×8 block to form a 16 element array for each center pixel preferably using a vector permutation instruction. This invention uses vector SIMD subtraction and compare and vector SIMD addition and compare to make the FAST algorithm comparisons. The N consecutive pixels determinations of the FAST algorithm are made from the results of plural shift and AND operations. The corresponding center pixel is marked a corner or not a corner dependent upon of the results of plural shift and AND operations.
Abstract:
This invention enables effective corner detection of pixels of an image using the FAST algorithm using a vector SIMD processor. This invention loads an 8×8 pixel block that includes four 7×7 pixel blocks including the 16 peripheral pixels to be tested for each of four center pixels. This invention rearranges the 64 pixels of the 8×8 block to form a 16 element array for each center pixel preferably using a vector permutation instruction. This invention uses vector SIMD subtraction and compare and vector SIMD addition and compare to make the FAST algorithm comparisons. The N consecutive pixels determinations of the FAST algorithm are made from the results of plural shift and AND operations. The corresponding center pixel is marked a corner or not a corner dependent upon of the results of plural shift and AND operations.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
This invention enables effective corner detection of pixels of an image using the FAST algorithm using a vector SIMD processor. This invention loads an 8×8 pixel block that includes four 7×7 pixel blocks including the 16 peripheral pixels to be tested for each of four center pixels. This invention rearranges the 64 pixels of the 8×8 block to form a 16 element array for each center pixel preferably using a vector permutation instruction. This invention uses vector SIMD subtraction and compare and vector SIMD addition and compare to make the FAST algorithm comparisons. The N consecutive pixels determinations of the FAST algorithm are made from the results of plural shift and AND operations. The corresponding center pixel is marked a corner or not a corner dependent upon of the results of plural shift and AND operations.
Abstract:
This disclosure is directed to the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.
Abstract:
Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.
Abstract:
Disclosed techniques relate to forming a block sum of picture elements employing a vector dot product instruction to sum packed picture elements and the mask producing a vector of masked horizontal picture element. The block sum is formed from plural horizontal sums via vector single instruction multiple data (SIMD) addition.
Abstract:
This invention deals with the problem of paralleling random read access within a reasonably sized block of data for a vector SIMD processor. The invention sets up plural parallel look up tables, moves data from main memory to each plural parallel look up table and then employs a look up table read instruction to simultaneously move data from each parallel look up table to a corresponding part a vector destination register. This enables data processing by vector single instruction multiple data (SIMD) operations. This vector destination register load can be repeated if the tables store more used data. New data can be loaded into the original tables if appropriate. A level one memory is preferably partitioned as part data cache and part directly addressable memory. The look up table memory is stored in the directly addressable memory.