摘要:
A multi-mode register rename mechanism which allows a simultaneous multi-threaded processor to support full out-of-order thread execution when the number of threads is low and in-order thread execution when the number of threads increases. Responsive to changing an execution mode of a processor to operate in in-order thread execution mode, the illustrative embodiments switch a physical register in the data processing system to an architected facility, thereby forming a switched physical register. When an instruction is issued to an execution unit, wherein the issued instruction comprises a thread bit, the thread bit is examined to determine if the instruction accesses an architected facility. If the issued instruction accesses an architected facility, the instruction is executed, and the results of the executed instruction are written to the switched physical register.
摘要:
Recovery circuits react to errors in a processor core by waiting for an error-free completion of any pending store-conditional instruction or a cache-inhibited load before ceasing to checkpoint or backup progress of a processor core. Recovery circuits remove the processor core from the logical configuration of the symmetric multiprocessor system, potentially reducing propagation of errors to other parts of the system. The processor core is reset and the checkpointed values may be restored to registers of the processor core. The core processor is allowed not just to resume execution just prior to the instructions that failed to execute correctly the first time, but is allowed to operate in a reduced execution mode for a preprogrammed number of groups. If the preprogrammed number of instruction groups execute without error, the processor core is allowed to resume normal execution.
摘要:
A method and apparatus for steering instructions dynamically, at issue time, so as to maximize the efficiency of use of execution units being shared by multiple threads being processed by an SMT processor. Resource vectors are used at issue time to redirect instructions, from threads being processed simultaneously, to shared resources for which the multiple threads are competing. The existing resource vectors for instructions that are queued for issuance are analyzed and, where appropriate, dynamically recalculated and modified for maximum efficiency.
摘要:
A method and apparatus are provided for dispatch group checkpointing in a microprocessor, including provisions for handling partially completed dispatch groups and instructions which modify system coherent state prior to completion. An instruction checkpoint retry mechanism is implemented to recover from soft errors in logic. The processor is able to dispatch fixed point unit (FXU), load/store unit (LSU), and floating point unit (FPU) or vector multimedia extension (VMX) instructions on the same cycle. Store data is written to a store queue when a store instruction finishes executing. The data is held in the store queue until the store instruction is checkpointed, at which point it can be released to the coherently shared level 2 (L2) cache.
摘要:
The present invention allows a microprocessor to identify and speculatively execute future instructions during a stall condition. This allows forward progress to be made through the instruction stream during the stall condition which would otherwise cause the microprocessor or thread of execution to be idle. The execution of such future instructions can initiate a prefetch of data or instructions from a distant cache or main memory, or otherwise make forward progress through the instruction stream. In this manner, when the instructions are re-executed (non speculatively executed) after the stall condition expires, they will execute with a reduced execution latency; e.g. by accessing data prefetched into the L1 cache, or enroute to the processor, or by executing the target instructions following a speculatively resolved mispredicted branch. In speculative mode, instruction operands may be invalid due to source loads that miss the L1 cache, facilities not available in speculative execution mode, or due to speculative instruction results that are not available. Dependency and dirty (i.e. invalid result) bits are tracked and used to determine which speculative instructions are valid for execution. A modified value register storage and bit vector are used to improve the availability of speculative results that would otherwise be discarded once they leave the execution pipeline because they cannot be written to the architected registers. The modified general purpose registers are used to store speculative results when the corresponding instruction reaches writeback and the modified bit vector tracks the results that have been stored there. Younger speculative instructions that do not bypass directly from older instructions will then use this modified data when the corresponding bit in the modified bit vector indicates the data has been modified. Otherwise, data from the architected registers will be used.
摘要:
An apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue of a processor are provided. Particularly, instructions are stored, one at a time at a clock cycle, in the non-moving queue. At every clock cycle, a present status of the instructions in the queue is recorded. Using the present status of the instructions in the queue in conjunction with previously recorded statuses of the instructions, the oldest instruction in the queue is determined. The status of the instructions in the queue includes whether or not the instruction has been issued for execution as well as whether or not it is known that the issued instruction has been accepted for execution.
摘要:
A more efficient method of handling instructions in a computer processor, by associating resource fields with respective program instructions wherein the resource fields indicate which of the processor hardware resources are required to carry out the program instructions, calculating resource requirements for merging two or more program instructions based on their resource fields, and determining resource availability for simultaneously executing the merged program instructions based on the calculated resource requirements. Resource vectors indicative of the required resource may be encoded into the resource fields, and the resource fields decoded at a later stage to derive the resource vectors. The resource fields can be stored in the instruction cache associated with the respective program instructions. The processor may operate in a simultaneous multithreading mode with different program instructions being part of different hardware threads. When the resource availability equals or exceeds the resource requirements for a group of instructions, those instructions can be dispatched simultaneously to the hardware resources. A start bit may be inserted in one of the program instructions to define the instruction group. The hardware resources may in particular be execution units such as a fixed-point unit, a load/store unit, a floating-point unit, or a branch processing unit.
摘要:
A method, apparatus, and computer program product are disclosed for ensuring processing fairness in simultaneous multi-threading (SMT) microprocessors. A clock cycle priority is assigned to a first thread and to a second thread during a standard selection state that lasts for an expected number of clock cycles by selecting the first thread to be a primary thread and the second thread to be a secondary thread. If a condition exists that requires overriding, an override state is executed by selecting the second thread to be the primary thread and the first thread to be the secondary thread. The override state is forced to be executed for an override period of time which equals the expected number of clock cycles plus a forced number of clock cycles. The forced number of clock cycles is granted to the first thread in response to the first thread again becoming the primary thread.
摘要:
Recovery circuits react to errors in a processor core by waiting for an error-free completion of any pending store-conditional instruction or a cache-inhibited load before ceasing to checkpoint or backup progress of a processor core. Recovery circuits remove the processor core from the logical configuration of the symmetric multiprocessor system, potentially reducing propagation of errors to other parts of the system. The processor core is reset and the checkpointed values may be restored to registers of the processor core. The core processor is allowed not just to resume execution just prior to the instructions that failed to execute correctly the first time, but is allowed to operate in a reduced execution mode for a preprogrammed number of groups. If the preprogrammed number of instruction groups execute without error, the processor core is allowed to resume normal execution.
摘要:
A method and related apparatus is provided for a processor having a number of registers, wherein instructions are sequentially issued to move through a sequence of execution stages, from an initial stage to a final write back stage. As a method, an embodiment includes the step of issuing a first instruction, such as an FMA instruction, to move through the sequence of execution stages, the first instruction being directed to a specified one of the registers. The method further includes issuing a second instruction to move through the execution stages, the second instruction being issued after the first instruction has issued, but before the first instruction reaches the final write back stage. The second instruction is likewise directed to the specified register, and comprises either a store instruction or a load instruction, selectively. R and W bits corresponding to the specified register are used to ensure that a store instruction does not read data from, and that a load instruction does not write data to the specified register, respectively, before the first instruction is moved to the final write back stage.