Abstract:
For instruction clusters for which no significant performance penalty is incurred, such as execution of hardware loops, a processor automatically and dynamically switches to a pipelined two-cycle access to an associated associative cache rather than a single-cycle access. An access involving more than one cycle uses less power because only the hit way within the cache memory is accessed rather than all ways within the indexed cache line. To maintain performance, the single-cycle cache access is utilized in all remaining instructions. In addition, where instruction clusters within a hardware loop fit entirely within a pre-fetch buffer, the cache sub-system is idled for any remaining iterations of the hardware loop to further reduce power consumption.
Abstract:
A processor architecture supports an electrical interface for coupling the processor core to one or more coprocessor extension units executing computational instructions, with a split-instruction transaction employed to provide operands and instructions to an extension unit and retrieve results from the extension unit. The generic instructions for sending an operation and data to the extension unit and/or retrieving data from the extension unit allow new computational instructions to be introduced without regeneration of the processor architecture. Support for multiple extension units and/or multiple execution pipes within each extension unit, multi-cycle execution latencies and different execution latencies between or within extension units, extension unit instruction predicates, and for handling processor core stalls and result save/restore on interrupt is included.
Abstract:
A ROM patching apparatus for use in a data processing system that executes instruction code stored the ROM. The ROM patching apparatus comprises: 1) a patch buffer for storing a first replacement cache line containing a first new instruction suitable for replacing at least a portion of the code in the ROM; 2) a lockable cache; 3) core processor logic operable to read from an associated memory a patch table containing a first table entry, the first table entry containing 1) the first new instruction and 2) a first patch address identifying a first patched ROM address of the at least a portion of the code in the ROM. The core processor logic loads the first new instruction from the patch table into the patch buffer, stores the first replacement cache line from the patch buffer into the lockable cache, and locks the first replacement cache line into the lockable cache.
Abstract:
A flexible Galois Field multiplier is provided which implements multiplication of two elements within a finite field defined by a degree and generator polynomial. One preferred embodiment provides a method for multiplying two elements of a finite field. According to the method, two input operands are mapped into a composite finite field, an initial KOA processing is performed upon the two operands in order to prepare the two operands for a multiplication in the ground field, the multiplication in the ground field is performed through the use of a triangular basis multiplier, and final KOA3 processing and optional modulo reduction processing is performed to produce the result. This design allows rapid redefinition of the degree and generator polynomial used for the ground field and the extension field.