摘要:
An apparatus is provided for sampling instructions in a processor pipeline of a system. The pipeline has a plurality of processing stages. The apparatus includes a fetch unit for fetching instructions into a first stage of the pipeline. Certain randomly selected instructions are identified, and state information of the system is sampled while a particular selected instruction is in any stage of the pipeline. Software is informed when the particular selected instruction leaves the pipeline so that the software can read any of the sampled state information.
摘要:
One embodiment of the present invention provides a system that selects instructions to be executed in a computer system that supports out-of-order execution of program instructions. The system receives dependency information for a first instruction. This dependency information identifies preceding instructions in the execution stream of a program that need to complete before the first instruction can be executed. The system divides this dependency information into a recent set and a less recent set. The recent set includes dependency information for a block of instructions immediately preceding the first instruction that need to complete before the first instruction can be executed. The less recent set includes dependency information for instructions not in the block of instructions immediately preceding the first instruction that need to complete before the first instruction can be executed. The system stores the recent set of dependency information in a first store, and stores the less recent set of dependency information in a second store. The first store is smaller and faster than the second store so that an update to dependency information takes less time to propagate through the first store than the second store. In one embodiment of the present invention, the system receives the dependency information for the first instruction from the first store and the second store, and determines from the dependency information if the first instruction is available to be executed by determining whether all preceding dependencies related to the first instruction have been satisfied. In one embodiment of the present invention, the system selects a second instruction from instructions that are available to be executed, and executes the second instruction.
摘要:
The present invention generally relates to synchronization of multiple threads in an out-of-order microprocessor utilizing the insertion of a trap. In one embodiment, while synchronizing multiple running threads, an instruction within a first running thread is identified. Upon identification of this instruction, a trap is inserted into a second running thread. All instructions within the instructional pipeline that are scheduled for execution prior to this trapped instruction must retire before the subsequent execution of the synchronizing instruction. Following the execution of the synchronizing instruction, all instructions within the instruction pipeline slated for execution after the trapped instruction in the remaining threads are flushed and refetched.
摘要:
An apparatus is provided for sampling multiple concurretly executing instructions in a processor pipeline of a system. The pipeline has a plurality of processing stages. The apparatus identifies multiple selected when the instructions are fetched into a first stage of the pipeline. A subset of the the multiple selected instructions to execute concurrently in the pipeline. State information of the system is sampled while any of the multiple selected instructions are in any stage of the pipeline. Software is informed whenever all of the selected instructions leave the pipeline so that the software can read any of the state information.
摘要:
A method of compacting an instruction queue in an out of order processor includes determining the number of invalid instructions below and including each row in the queue, by counting invalid bits or validity indicators associated with rows below and up to the current row. For each row, multiplexor select signals are generated from the flat vector counts for the N rows above and including the present row, and from the validity indicators associated with the N rows, where N is a predetermined value. A multiplexor associated with a particular row selects one of the N rows according to the select value, and moves or passes the instruction held in the selected row to the present row. A row's select value is determined by forming a diagonal from the N count vectors corresponding to the N rows above and including the present row, and logically ANDing, each diagonal bit with the valid bit associated with the same row. Each row's count vector is determined in two stages. In the first stage, a local count is determined for each row in a local group of rows, and a global count is determined for the entire local group. Each local count is determined by counting the validity indicators associated with rows in the local group. In the second stage, a final count is determined for each row in the queue, by combining the local and global counts generated for the local group in the first stage, with global counts generated in local groups below the local group. The N rows can extend to the queue's input pipeline.
摘要:
An apparatus is provided for sampling instructions in a processor pipeline of a computer system. The pipeline has a plurality of processing stages. Instructions are fetched into a first stage of the pipeline. A subset of the fetched instructions are identified as selected instructions. Event, latency, and state information of the system is sampled while any of the selected instructions are in any stage of the pipeline. Software is informed whenever any of the selected instructions leaves the pipeline to read the event and latency information.