Abstract:
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.
Abstract:
In one disclosed embodiment, a processor includes a first execution unit and a second execution unit, a register file, and a data path including a plurality of lanes. The data path and the register file are arranged so that writing to the register file by the first execution unit and by the second execution unit is allowed over the data path, reading from the register file by the first execution unit is allowed over the data path, and reading from the register file by the second execution unit is not allowed over the data path. The processor also includes a power control circuit configured to, when a transfer of data between the register file and either of the first and second execution units uses less than all of the lanes, power down the lanes of the data path not used for the transfer of the data.
Abstract:
The number of registers required is reduced by overlapping scalar and vector registers. This allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are reduced by restricting read access. Dedicated predicate registers reduce requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.
Abstract:
One embodiment of this invention provides two conditional execution auxiliary instructions directed to disparate subsets of the plural functional units. Depending on the conditional execution desired, only one of the two conditional execution auxiliary instructions may be required for a particular execute packet. Another embodiment of this invention employs only one of two possible register files for the condition registers. In a VLIW processor it may be advantageous to split the functional units into separate sets with corresponding register files. This limits the number of functional units that may simultaneously access the register files. In the preferred embodiment of this invention the functional units are divided into a scalar set which access scalar registers and a vector set which access vector registers. The data registers storing the conditions for both scalar and vector instructions are in the scalar data register file.
Abstract:
This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product.
Abstract:
This invention is a digital signal processor capable of performing correlation of data with pseudo noise for code division multiple access (CDMA) decoding using clusters. Each cluster includes plural multipliers. The multipliers multiply real and imaginary parts of packed data by corresponding pseudo noise data. Within a cluster the real parts and the imaginary parts of the products are summed separately. This forms plural complex number outputs equal in number to the number of clusters. The pseudo noise data is offset relative to the data input differing amounts for different clusters. The clusters are divided into first half clusters receiving data from even numbered slots and second half clusters receiving data from odd numbered slots. The correlation unit includes a mask input to selectively zero a multiplier product.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed for multi-banked victim cache with dual datapath. An example cache system includes a storage element that includes banks operable to store data, ports operable to receive memory operations in parallel, wherein each of the memory operations has a respective address, and a plurality of comparators coupled such that each of the comparators is coupled to a respective port of the ports and a respective bank of the banks and is operable to determine whether a respective address of a respective memory operation received by the respective port corresponds to the data stored in the respective bank.
Abstract:
Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.
Abstract:
A system includes a multi-core shared memory controller (MSMC). The MSMC includes a snoop filter bank, a cache tag bank, and a memory bank. The cache tag bank is connected to both the snoop filter bank and the memory bank. The MSMC further includes a first coherent slave interface connected to a data path that is connected to the snoop filter bank. The MSMC further includes a second coherent slave interface connected to the data path that is connected to the snoop filter bank. The MSMC further includes an external memory master interface connected to the cache tag bank and the memory bank. The system further includes a first processor package connected to the first coherent slave interface and a second processor package connected to the second coherent slave interface. The system further includes an external memory device connected to the external memory master interface.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed for read-modify-write support in multi-banked data RAM cache for bank arbitration. An example data cache system includes a store queue including a plurality of bank queues including a first bank queue having a write and read port configured to receive a respective write and read operation, storage coupled to the store queue including a plurality of data banks including a first data bank having a first port configured to receive the write or the read operation, first through third multiplexers, and bank arbitration logic including first arbiters including a first arbiter and second arbiters including a second arbiter, the first arbiter coupled to the second arbiter, the second and third multiplexers, the second arbiter coupled to the first multiplexer.