Abstract:
A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
Abstract:
An embedded megamodule and an embedded CPU enable power-saving through a combination of hardware and software. The CPU configures the power-down controller (PDC) logic within megamodule and can software trigger a low-power state of logic modules during processor IDLE periods. To wake from this power-down state, a system event is asserted to the CPU through the module interrupt controller. Thus the entry into a low-power state is software-driven during periods of inactivity and power restoration is on system activity that demands the attention of the CPU.
Abstract:
Methods, apparatus, systems and articles of manufacture to reduce bank pressure using aggressive write merging are disclosed. An example apparatus includes a first cache storage; a second cache storage; a store queue coupled to at least one of the first cache storage and the second cache storage and operable to: receive a first memory operation; process the first memory operation for storing the first set of data in at least one of the first cache storage and the second cache storage; receive a second memory operation; and prior to storing the first set of data in the at least one of the first cache storage and the second cache storage, merge the first memory operation and the second memory operation.
Abstract:
A method of storing register data elements to interleave with data elements of a different register, a processor thereof, and a system thereof, wherein each non-consecutive data elements of a register is retrieved to be stored to interleave with each non-consecutive data elements of a different register upon an executive of an interleaving store instruction, wherein a mask instruction directing a lane of a storage space in which the non-consecutive data elements are stored is executed in conjunction with the interleaving store instruction, and wherein a processor of a second type is configured to emulate a processor of a first type to store the non-consecutive data elements the same as non-consecutive data elements stored in the first type processor.
Abstract:
Methods, apparatus, systems and articles of manufacture to facilitate atomic compare and swap in cache for a coherent level 1 data cache system are disclosed. An example system includes a cache storage; a cache controller coupled to the cache storage wherein the cache controller is operable to: receive a memory operation that specifies a key, a memory address, and a first set of data; retrieve a second set of data corresponding to the memory address; compare the second set of data to the key; based on the second set of data corresponding to the key, cause the first set of data to be stored at the memory address; and based on the second set of data not corresponding to the key, complete the memory operation without causing the first set of data to be stored at the memory address.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to forward and invalidate inflight data in a store queue. An example apparatus includes a cache storage, a cache controller coupled to the cache storage and operable to receive a first memory operation, determine that the first memory operation corresponds to a read miss in the cache storage, determine a victim address in the cache storage to evict in response to the read miss, issue a read-invalidate command that specifies the victim address, compare the victim address to a set of addresses associated with a set of memory operations being processed by the cache controller, and in response to the victim address matching a first address of the set of addresses corresponding to a second memory operation of the set of memory operations, provide data associated with the second memory operation.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding. An example apparatus includes a first storage, a second storage, a store queue coupled to the first storage and the second storage, the store queue operable to receive a first memory operation specifying a first set of data, process the first memory operation for storing the first set of data in at least one of the first storage and the second storage, receive a second memory operation, and prior to storing the first set of data in the at least one of the first storage and the second storage, feedback the first set of data for use in the second memory operation.
Abstract:
A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache management operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
Abstract:
Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.
Abstract:
A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.