Abstract:
A system and a method are disclosed to process instructions in an execution unit (EU) that includes an operand cache (OC). The OC stores a copy of at least one frequently used operand stored in a physical register file (PRF). The EU may process instructions using operands obtained from the PRF or from the OC. In the first mode, an OC renaming unit (OC-REN) indicates to the EU to process instructions using operands obtained from the OC if processing the instructions using operands obtained from the OC uses less power than using operands obtained from the PRF. In the second mode, the OC-REN indicates to the EU to process the instructions using operands obtained from the PRF if processing the instructions using operands obtained from the PRF uses less power than using operands obtained from the OC.
Abstract:
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.
Abstract:
A system and a method to cascade execution of instructions in a load-store unit (LSU) of a central processing unit (CPU) to reduce latency associated with the instructions. First data stored in a cache is read by the LSU in response a first memory load instruction of two immediately consecutive memory load instructions. Alignment, sign extension and/or endian operations are performed on the first data read from the cache in response to the first memory load instruction, and, in parallel, a memory-load address-forwarded result is selected based on a corrected alignment of the first data read in response to the first memory load instruction to provide a next address for a second of the two immediately consecutive memory load instructions. Second data stored in the cache is read by the LSU in response to the second memory load instruction based on the selected memory-load address-forwarded result.
Abstract:
According to one general aspect, a load unit may include a load circuit configured to load at least one piece of data from a memory. The load unit may include an alignment circuit configured to align the data to generate an aligned data. The load unit may also include a mathematical operation execution circuit configured to generate a resultant of a predetermined mathematical operation with the at least one piece of data as an operand. Wherein the load unit is configured to, if an active instruction is associated with the predetermined mathematical operation, bypass the alignment circuit and input the piece of data directly to the mathematical operation execution circuit.
Abstract:
According to one general aspect, an apparatus may include an instruction fetch unit, an execution unit, and a cache resynchronization predictor, as described above. The instruction fetch unit may be configured to issue a first memory read operation to a memory address, and a first memory write operation to the memory address, wherein the first memory read operation is stored at an instruction address. The execution unit may be configured to execute the first memory read operation, wherein the execution of the first memory read operation causes a resynchronization exception. The cache resynchronization predictor may be configured to associate the instruction address with a resynchronization exception, and determine if a memory read operation stored at the instruction address comprises a resynchronization predicted store.