摘要:
An information handling system includes a processor that issues instructions out of program order. The processor includes an issue queue that may advance instructions toward issue even though some instructions in the queue are not ready-to-issue. The issue queue includes a main array of storage cells and an auxiliary array of storage cells coupled thereto. When a particular row of the main array includes an instruction that is not ready-to-issue, a stall condition occurs for that instruction. However, to prevent the entire issue queue and processor from stalling, a ready-to-issue instruction in another row of the main array may bypass the row including the stalled or not-ready-to-issue instruction. To effect this bypass, the issue queue moves the ready-to-issue instruction to an issue row of the auxiliary array for issuance to an appropriate execution unit. Out-of-order issuance of instructions to the execution units thus continues despite the stalled instruction.
摘要:
An information handling system includes a processor that issues instructions out of program order. The processor includes an issue queue that may advance instructions toward issue even though some instructions in the queue are not ready-to-issue. The issue queue includes a matrix of storage cells configured in rows and columns including a first row that couples to execution units. Instructions advance toward issuance from row to row as unoccupied storage cells appear. Unoccupied cells appear when instructions advance toward the first row and upon issuance. When a particular row includes an instruction that is not ready-to-issue a stall condition occurs for that instruction. However, to prevent the entire issue queue and processor from stalling, a ready-to-issue instruction in another row may bypass the row including the stalled or not-ready-to-issue instruction. Out-of-order issuance of instructions to the execution units thus continues.
摘要:
A method and apparatus are provided for detecting and handling an instruction flush in a microprocessor system. A flush mechanism is provided that is distributed across all of the execution units in a data processing system. The flush mechanism does not require a central collection point to re-distribute the flush signals to the execution units. Each unit generates a flush vector to all other execution units which is used to block register updates for the flushed instructions
摘要:
A system and method for dynamic power management in a processor design is presented. A pipeline stage's stall detection logic detects a stall condition, and sends a signal to idle detection logic to gate off the pipeline's register clocks. The stall detection logic also monitors a downstream pipeline stage's stall condition, and instructs the idle detection logic to gate off the pipeline stage's registers when the downstream pipeline stage is in a stall condition as well. In addition, when the pipeline stage's stall detection logic detects a stall condition, either from the downstream pipeline stage or from its own pipeline units, the pipeline stage's stall detection logic informs an upstream pipeline stage to gate off its clocks and thus, conserve more power.
摘要:
A system and method for dynamic switching between performance schemes is presented. The software program uses an instruction to indicate whether a pacing performance scheme or a flushing performance scheme is to be used. The selection by the software program is stored in a hardware register that the processor uses to determine whether the pacing or flushing performance scheme is used. After setting the performance scheme, subsequent instructions of the software program will be executed using the selected performance scheme. The pacing performance scheme preemptively stalls an instruction that might overload the queue that stores instructions for the Load/Store Unit (LSU). The flushing performance scheme flushes instructions when the LSU storage queue is overloaded and holds the thread that caused the overflow dormant until the queue is no longer full.
摘要:
A method, an apparatus and a computer program product are provided for the managing of SIMD instructions and GP instructions within an instruction pipeline of a processor. The SIMD instructions and the GP instructions share the same “front-end” pipelines within an Instruction Unit. Within the shared pipelines the Instruction Unit checks the GP instructions for dependencies and resolves these dependencies. At the dispatch point within the pipelines the Instruction Unit sends valid GP instructions to the GP Unit and SIMD instructions to an SIMD issue queue. In the SIMD issue queue the Instruction Unit checks the SIMD instructions for dependencies and resolves these dependencies. Then the SIMD issue queue dispatches the SIMD instructions to the SIMD Unit. Accordingly, dependencies involving SIMD instructions do not affect GP instructions because the SIMD dependencies are checked and resolved independently.
摘要:
The present invention provides a method, a computer program product, and an apparatus for blocking a thread at dispatch in a multi-thread processor for fine-grained control of thread performance. Multiple threads share a pipeline within a processor. Therefore, a long latency condition for an instruction on one thread can stall all of the threads that share the pipeline. A dispatch-block signaling instruction blocks the thread containing the long latency condition at dispatch. The length of the block matches the length of the latency, so the pipeline can dispatch instructions from the blocked thread after the long latency condition is resolved. In one embodiment the dispatch-block signaling instruction is a modified OR instruction and in another embodiment it is a Nop instruction. By blocking one thread at dispatch, the processor can dispatch instructions from the other threads during the block.
摘要:
A system and method for placing a processor into a gradual slow down mode of operation are provided. The gradual slow down mode of operation comprises a plurality of stages of slow down operation of an issue unit in a processor in which the issuance of instructions is slowed in accordance with a staging scheme. The gradual slow down of the processor allows the processor to break out of livelock conditions. Moreover, since the slow down is gradual, the processor may flexibly avoid various degrees of livelock conditions. The mechanisms of the illustrative embodiments impact the overall processor performance based on the severity of the livelock condition by taking a small performance impact on less severe livelock conditions and only increasing the processor performance impact when the livelock condition is more severe.
摘要:
An issue unit for placing a processor into a gradual slow down mode of operation is provided. The gradual slow down mode of operation comprises a plurality of stages of slow down operation of an issue unit in a processor in which the issuance of instructions is slowed in accordance with a staging scheme. The gradual slow down of the processor allows the processor to break out of livelock conditions. Moreover, since the slow down is gradual, the processor may flexibly avoid various degrees of livelock conditions. The mechanisms of the illustrative embodiments impact the overall processor performance based on the severity of the livelock condition by taking a small performance impact on less severe livelock conditions and only increasing the processor performance impact when the livelock condition is more severe.
摘要:
A method, system and processor for improving the performance of an in-order processor. A processor may include an execution unit with an execution pipeline that includes a backup pipeline and a regular pipeline. The backup pipeline may store a copy of the instructions issued to the regular pipeline. The execution pipeline may include logic for allowing instructions to flow from the backup pipeline to the regular pipeline following the flushing of the instructions younger than the exception detected in the regular pipeline. By maintaining a backup copy of the instructions issued to the regular pipeline, instructions may not need to be flushed from separate execution pipelines and re-fetched. As a result, one may complete the results of the execution units to the architected state out of order thereby allowing the completion point to vary among the different execution pipelines.