摘要:
An apparatus for integer exception register (XER) renaming and methods of using the same are implemented. In a central processing unit (CPU) having a pipelined architecture, integer instructions that use or update the XER may be executed out-of-order using the XER renaming mechanism. Any instruction that updates the XER has an associated instruction identifier (IID) stored in a register. Subsequent instructions that use data in the XER use the stored IID to determine when the XER data has been updated by the execution of the instruction corresponding to the stored IID. As each instruction that updates XER data is executed, the data is stored in an XER rename buffer. Instructions using XER data then obtain the updated, valid, XER data from the rename buffer. In this way, these instructions can obtain valid XER data prior to completion of the preceding instructions. The XER data is retrieved from the XER rename buffer by indexing into the buffer by using an index derived from the stored IID. Because the updated XER data is available in the rename buffer before the updating instruction completes, out-of-order execution of instructions using or updating XER data is thereby realized.
摘要:
A circuit and method provide rename register reallocation for simultaneous multi-threaded (SMT) processors that redistributes rename (mapped) resources between one thread during single-threaded (ST) execution and multiple threads during multi-threaded execution. The processor receives an instruction specifying a transition from a single-threaded to a multi-threaded mode or vice-versa and halts execution of all threads executing on the processor. The internal control logic then signals the resources to reallocate the resources. Rename resources are reallocated by directing an action at the rename mapper. When switching from SMT to ST mode, the mapper is directed to drop entries for the dying thread, but on a switch from ST to SMT mode, “dummy” instruction group dispatch indications are sent to the mapper that indicate use of all architected registers for each thread.
摘要:
An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction.
摘要:
An information handling system includes a processor that may perform issue queue virtual load/store instruction operations. The issue queue maintains load and store instructions with a real/virtual dependency flag. The issue queue provides storage resources for real and virtual load/store instructions. Real load/store instructions execute in a load store unit LSU. Virtual load/store instructions are pending execution in the LSU. The LSU may keep track of each virtual load/store instruction within the issue queue by thread, type, and pointer data. Provided that all dependencies are clear for a pending virtual load/store instruction, the LSU marks the pending virtual load/store instruction as real. The pending virtual load/store instruction may then issue to the LSU as a real load/store instruction.
摘要:
A method, apparatus, and computer program product are disclosed for ensuring processing fairness in simultaneous multi-threading (SMT) microprocessors. A clock cycle priority is assigned to a first thread and to a second thread during a standard selection state that lasts for an expected number of clock cycles by selecting the first thread to be a primary thread and the second thread to be a secondary thread. If a condition exists that requires overriding, an override state is executed by selecting the second thread to be the primary thread and the first thread to be the secondary thread. The override state is forced to be executed for an override period of time which equals the expected number of clock cycles plus a forced number of clock cycles. The forced number of clock cycles is granted to the first thread in response to the first thread again becoming the primary thread.
摘要:
An apparatus and method are provided for reading a plurality of consecutive entries and writing a plurality of consecutive entries with only one read address and one write address using a 2Read/2Write register file. In one exemplary embodiment, a 64 entry register file array is partitioned into four sub-arrays. Each sub-array contains sixteen entries having one or more 2Read/2Write SRAM cells. The apparatus and method provide a mechanism to write the consecutive entries by only having a 4 to 16 decode of one address. In addition, the apparatus and method provide a mechanism for reading data from the register file array using a starting read word address and two read word lines generated based on the starting read word address. The two read word lines are used to access the two read ports of the entries in the sub-arrays.
摘要:
Recovery circuits react to errors in a processor core by waiting for an error-free completion of any pending store-conditional instruction or a cache-inhibited load before ceasing to checkpoint or backup progress of a processor core. Recovery circuits remove the processor core from the logical configuration of the symmetric multiprocessor system, potentially reducing propagation of errors to other parts of the system. The processor core is reset and the checkpointed values may be restored to registers of the processor core. The core processor is allowed not just to resume execution just prior to the instructions that failed to execute correctly the first time, but is allowed to operate in a reduced execution mode for a preprogrammed number of groups. If the preprogrammed number of instruction groups execute without error, the processor core is allowed to resume normal execution.
摘要:
A method and related apparatus is provided for a processor having a number of registers, wherein instructions are sequentially issued to move through a sequence of execution stages, from an initial stage to a final write back stage. As a method, an embodiment includes the step of issuing a first instruction, such as an FMA instruction, to move through the sequence of execution stages, the first instruction being directed to a specified one of the registers. The method further includes issuing a second instruction to move through the execution stages, the second instruction being issued after the first instruction has issued, but before the first instruction reaches the final write back stage. The second instruction is likewise directed to the specified register, and comprises either a store instruction or a load instruction, selectively. R and W bits corresponding to the specified register are used to ensure that a store instruction does not read data from, and that a load instruction does not write data to the specified register, respectively, before the first instruction is moved to the final write back stage.
摘要:
Recovery circuits react to errors in a processor core by waiting for an error-free completion of any pending store-conditional instruction or a cache-inhibited load before ceasing to checkpoint or backup progress of a processor core. Recovery circuits remove the processor core from the logical configuration of the symmetric multiprocessor system, potentially reducing propagation of errors to other parts of the system. The processor core is reset and the checkpointed values may be restored to registers of the processor core. The core processor is allowed not just to resume execution just prior to the instructions that failed to execute correctly the first time, but is allowed to operate in a reduced execution mode for a preprogrammed number of groups. If the preprogrammed number of instruction groups execute without error, the processor core is allowed to resume normal execution.
摘要:
A system and method for predictive early allocation of stores in a microprocessor is presented. During instruction dispatch, an instruction dispatch unit retrieves an instruction from an instruction cache (Icache). When the retrieved instruction is an interruptible instruction, the instruction dispatch unit loads the interruptible instruction's instruction tag (IITAG) into an interruptible instruction tag register. A load store unit loads subsequent instruction information (instruction tag and store data) along with the interruptible instruction tag in a store data queue entry. Comparison logic receives a completing instruction tag from completion logic, and compares the completing instruction tag with the interruptible instruction tags included in the store data queue entries. In turn, deallocation logic deallocates those store data queue entries that include an interruptible instruction tag that matches the completing instruction tag.