Abstract:
In one particular example, this disclosure provides an efficient mechanism to determine the degree of parallelization possible for a loop in the presence of possible memory aliases that cannot be resolved at compile-time. Hardware instructions are provided that test memory addresses at run-time and set a mode or register that enables a single instance of a loop to run the maximum number of SIMD (Single Instruction, Multiple Data) lanes to run in parallel that obey the semantics of the original scalar loop. Other hardware features that extend applicability or performance of such instructions are enumerated.
Abstract:
According to an example embodiment, a processor such as a digital signal processor (DSP), is provided with a register acting as a predicate counter. The predicate counter may include more than two useful values, and in addition to acting as a condition for executing an instruction, may also keep track of nesting levels within a loop or conditional branch. In some cases, the predicate counter may be configured to operate in single-instruction, multiple data (SIMD) mode, or SIMD-within-a-register (SWAR) mode.
Abstract:
In an example, a system and method are provided for predicting in which way a requested memory address is most likely to be held in a multi-way cache, based on the last way accessed by the specified address register if available. If not available, then the system may determine that no best prediction is available. In that case, each way is read, and the superfluous values are disregarded, or a cache fill is performed as necessary. In certain embodiments, only a portion of the least significant bits of an add operation are used for way prediction in base-plus-offset addressing modes. This enables the decision to be made before the full-width add is complete, so that the clock cycle length is not unnecessarily lengthened by the prediction operation.
Abstract:
In one particular example, this disclosure provides an efficient mechanism to determine the degree of parallelization possible for a loop in the presence of possible memory aliases that cannot be resolved at compile-time. Hardware instructions are provided that test memory addresses at run-time and set a mode or register that enables a single instance of a loop to run the maximum number of SIMD (Single Instruction, Multiple Data) lanes to run in parallel that obey the semantics of the original scalar loop. Other hardware features that extend applicability or performance of such instructions are enumerated.
Abstract:
According to an example embodiment, a processor such as a digital signal processor (DSP), is provided with a register acting as a predicate counter. The predicate counter may include more than two useful values, and in addition to acting as a condition for executing an instruction, may also keep track of nesting levels within a loop or conditional branch. In some cases, the predicate counter may be configured to operate in single-instruction, multiple data (SIMD) mode, or SIMD-within-a-register (SWAR) mode.