Abstract:
A logic circuit in a processor including a plurality of input registers, each for storing a vector containing data elements, a coefficient register for storing a vector containing N coefficients, an output register for storing a result vector, and an arithmetic unit configured to: obtain a pattern for selecting N data elements from the plurality of input registers, select a plurality of groups of N data elements from the plurality of input registers in parallel, wherein each group is selected in accordance with the pattern, and wherein each group is shifted with respect to a previous selected group, perform an arithmetic operation between each of the selected groups and the coefficients in parallel, and store results of the arithmetic operations in the output register.
Abstract:
A system and method for preventing cache contention for a cache including a plurality of ways and a separate port for each way, the method including: obtaining, in a core of a processor, a multidimensional coefficient array of a multidimensional filter, and pointers to data elements from a plurality of rows of a multidimensional data array, and loading the plurality of rows into the cache, where each row is stored in a different way of the cache.