Abstract:
A stream of data is accessed from a memory system by an autonomous memory access engine, converted on the fly by the memory access engine, and then presented to a processor for data processing. A portion of a lookup table (LUT) containing converted data elements is preloaded into a lookaside buffer associated with the memory access engine. As the stream of data elements is fetched from the memory system each data element in the stream of data elements is replaced with a respective converted data element obtained from the LUT in the lookaside buffer according to a content of each data element to thereby form a stream of converted data elements. The stream of converted data elements is then propagated from the memory access engine to a data processor.
Abstract:
A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
Abstract:
A queuing requester for access to a memory system is provided. Transaction requests are received from two or more requestors for access to the memory system. Each transaction request includes an associated priority value. A request queue of the received transaction requests is formed in the queuing requester. Each transaction request includes an associated priority value. A highest priority value of all pending transaction requests within the request queue is determined. An elevated priority value is selected when the highest priority value is higher than the priority value of an oldest transaction request in the request queue; otherwise the priority value of the oldest transaction request is selected. The oldest transaction request in the request queue with the selected priority value is then provided to the memory system. An arbitration contest with other requesters for access to the memory system is performed using the selected priority value.
Abstract:
This invention optimizes DMA writes to directly addressable level two memory that is cached in level one and the line is valid and dirty. When the level two controller detects that a line is valid and dirty in level one, the level two memory need not update its copy of the data. Level one memory will replace the level two copy with a victim writeback at a future time. Thus the level two memory need not store write a copy. This limits the number of DMA writes to level two directly addressable memory and thus improves performance and minimizes dynamic power. This also frees the level two memory for other master/requestors.
Abstract:
In a very long instruction word (VLIW) central processing unit instructions are grouped into execute packets that execute in parallel. A constant may be specified or extended by bits in a constant extension instruction in the same execute packet. If an instruction includes an indication of constant extension, the decoder employs bits of a constant extension instruction to extend the constant of an immediate field. Two or more constant extension slots are permitted in each execute packet, each extending constants for a different predetermined subset of functional unit instructions. In an alternative embodiment, more than one functional unit may have constants extended from the same constant extension instruction employing the same extended bits. A long extended constant may be formed using the extension bits of two constant extension instructions.
Abstract:
In some examples, a circuit includes an interface configured to couple to a memory that includes a set of outputs to provide a set of data from the memory. The circuit further includes a rotator coupled to the interface that includes a first set of multiplexors that each include a set of inputs coupled to the set of outputs of the interface and an output. The circuit further includes a storage circuit coupled to the rotator that includes a register file coupled to the outputs of the first set of multiplexors an alignment network. The alignment network includes a second set of multiplexors that each include a set of inputs coupled to the register file and an output.
Abstract:
A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache management operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.
Abstract:
In some examples, a circuit includes an interface configured to couple to a memory that includes a set of outputs to provide a set of data from the memory. The circuit further includes a rotator coupled to the interface that includes a first set of multiplexors that each include a set of inputs coupled to the set of outputs of the interface and an output. The circuit further includes a storage circuit coupled to the rotator that includes a register file coupled to the outputs of the first set of multiplexors an alignment network. The alignment network includes a second set of multiplexors that each include a set of inputs coupled to the register file and an output.
Abstract:
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Abstract:
Devices and methods are provided for identifying a debug event associated with a data element of a data stream, and performing debugging when a processor executes a software program in connection with the data stream. The debug event is tracked through a data pipeline to the processor. In an embodiment, the debug event is acted on only when the processor is ready to consume the data element associated with the debug event. In an embodiment, the debug event is determined by monitoring iteration counts of loop counters associated with an address generator and comparing the iteration counts to respective stored count values.