摘要:
One embodiment of the present invention provides a system that prevents data hazards during simultaneous speculative threading. The system starts by executing instructions in an execute-ahead mode using a first thread. While executing instructions in the execute-ahead mode, the system maintains dependency information for each register indicating whether the register is subject to an unresolved data dependency. Upon the resolution of a data dependency during execute-ahead mode, the system copies dependency information to a speculative copy of the dependency information. The system then commences execution of the deferred instructions in a deferred mode using a second thread. While executing instructions in the deferred mode, if the speculative copy of the dependency information for a destination register indicates that a write-after-write (WAW) hazard exists with a subsequent non-deferred instruction executed by the first thread in execute-ahead mode, the system uses the second thread to execute the deferred instruction to produce a result and forwards the result to be used by subsequent deferred instructions without committing the result to the architectural state of the destination register. Hence, the system makes the result available to the subsequent deferred instructions without overwriting the result produced by a following non-deferred instruction.
摘要:
One embodiment of the present invention provides a system which creates multiple checkpoints in a processor that supports speculative-execution. The system starts by issuing instructions for execution in program order during execution of a program in a normal-execution mode. Upon encountering a launch condition during an instruction which causes a processor to enter execute-ahead mode, the system performs an initial checkpoint and commences execution of instructions in execute-ahead mode. Upon encountering a predefined condition during execute-ahead mode, the system generates an additional checkpoint and continues to execute instructions in execute-ahead mode. Generating the additional checkpoint allows the processor to return to the additional checkpoint, instead of the previous checkpoint, if the processor subsequently encounters a condition that requires the processor to return to a checkpoint.
摘要:
One embodiment of the present invention provides a system that synchronizes threads on a multi-threaded processor. The system starts by executing instructions from a multi-threaded program using a first thread and a second thread. When the first thread reaches a predetermined location in the multi-threaded program, the first thread executes a Start-Transactional-Execution (STE) instruction to commence transactional execution, wherein the STE instruction specifies a location to branch to if transactional execution fails. During the subsequent transactional execution, the first thread accesses a mailbox location in memory (which is also accessible by the second thread) and then executes instructions that cause the first thread to wait. When the second thread reaches a second predetermined location in the multi-threaded program, the second thread signals the first thread by accessing the mailbox location, which causes the transactional execution of the first thread to fail, thereby causing the first thread to resume non-transactional execution from the location specified in the STE instruction. In this way, the second thread can signal to the first thread without the first thread having to poll a shared variable.
摘要:
One embodiment of the present invention provides a system which supports out-of-order issue in a processor that normally executes instructions in-order. The system starts by issuing instructions from an issue queue in program order during a normal-execution mode. While issuing the instructions, the system determines if any instruction in the issue queue has an unresolved short-latency data dependency which depends on a short-latency operation. If so, the system generates a checkpoint and enters an out-of-order-issue mode, wherein instructions in the issue queue with unresolved short-latency data dependencies are held and not issued, and wherein other instructions in the issue queue without unresolved data dependencies are allowed to issue out-of-order.
摘要:
One embodiment of the present invention provides a system that facilitates executing a memory barrier (membar) instruction in an execute-ahead processor, wherein the membar instruction forces buffered loads and stores to complete before allowing a following instruction to be issued. During operation in a normal-execution mode, the processor issues instructions for execution in program order. Upon encountering a membar instruction, the processor determines if the load buffer and store buffer contain unresolved loads and stores. If so, the processor defers the membar instruction and executes subsequent program instructions in execute-ahead mode. In execute-ahead mode, instructions that cannot be executed because of an unresolved data dependency are deferred, and other non-deferred instructions are executed in program order. When all stores and loads that precede the membar instruction have been committed to memory from the store buffer and the load buffer, the processor enters a deferred mode and executes the deferred instructions, including the membar instruction, in program order. If all deferred instructions have been executed, the processor returns to the normal-execution mode and resumes execution from the point where the execute-ahead mode left off.
摘要:
One embodiment of the present invention provides a system that facilitates deferring execution of instructions with unresolved data dependencies as they are issued for execution in program order. During a normal execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system generates a checkpoint that can subsequently be used to return execution of the program to the point of the instruction. Next, the system executes the instruction and subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of an unresolved data dependency are deferred, and wherein other non-deferred instructions are executed in program order. Upon encountering a store during the execute-ahead mode, the system determines if the store buffer is full. If so, the system prefetches a cache line for the store, and defers execution of the store. If the number of stores that are encountered during execute-ahead mode exceeds the capacity of the store buffer, which means that the store buffer will never have additional space to accept additional stores during the execute-ahead mode because the store buffer is gated, the system directly enters the scout mode, without waiting for the deferred queue to eventually fill.
摘要:
One embodiment of the present invention provides a system that prevents data hazards during simultaneous speculative threading. The system starts by executing instructions in an execute-ahead mode using a first thread. While executing instructions in the execute-ahead mode, the system maintains dependency information for each register indicating whether the register is subject to an unresolved data dependency. Upon the resolution of a data dependency during execute-ahead mode, the system copies dependency information to a speculative copy of the dependency information. The system then commences execution of the deferred instructions in a deferred mode using a second thread. While executing instructions in the deferred mode, if the speculative copy of the dependency information for a destination register indicates that a write-after-write (WAW) hazard exists with a subsequent non-deferred instruction executed by the first thread in execute-ahead mode, the system uses the second thread to execute the deferred instruction to produce a result and forwards the result to be used by subsequent deferred instructions without committing the result to the architectural state of the destination register. Hence, the system makes the result available to the subsequent deferred instructions without overwriting the result produced by a following non-deferred instruction.
摘要:
Mechanisms have been developed for providing great flexibility in processor instruction handling, sequencing and execution. In particular, it has been discovered that a configurable predecode mechanism can be employed to select, for respective instruction patterns, between fixed decode and programmable decode paths provided by a processor. In this way, a patchable and/or programmable decode mechanism can be efficiently provided. In some realizations, either (or both) predecode or (and) decode may be configured or reconfigured post-manufacture. In some realizations, either (or both) predecode or (and) decode may be configured at (or about) initialization. In some realizations, either (or both) predecode or (and) decode may be configured at run-time.
摘要:
Mechanisms have been developed for providing great flexibility in processor instruction handling, sequencing and execution. In particular, it has been discovered that a programmable pre-decode mechanism can be employed to alter the behavior of a processor. For example, pre-decode hints for sequencing, synchronization or speculation control may altered or mappings of ISA instructions to native instructions or operation sequences may be altered. Such techniques may be employed to adapt a processor implementation (in the field) to varying memory models, implementations or interfaces or to varying memory latencies or timing characteristics. Similarly, such techniques may be employed to adapt a processor implementation to correspond to an extended/adapted instruction set architecture. In some realizations, instruction pre-decode functionality may be adapted at processor run-time to handle or mitigate a timing, concurrency or speculation issue. In some realizations, operation of pre-decode may be reprogrammed post-manufacture, at (or about) initialization, or at run-time.
摘要:
One embodiment of the present invention provides a system which selectively executes deferred instructions following a return of a long-latency operation in a processor that supports speculative-execution. During normal-execution mode, the processor issues instructions for execution in program order. When the processor encounters a long-latency operation, such as a load miss, the processor records the long-latency operation in a long-latency scoreboard, wherein each entry in the long-latency scoreboard includes a deferred buffer start index. Upon encountering an unresolved data dependency during execution of an instruction, the processor performs a checkpointing operation and executes subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of the unresolved data dependency are deferred into a deferred buffer, and wherein other non-deferred instructions are executed in program order. Upon encountering a deferred instruction that depends on a long-latency operation within the long-latency scoreboard, the processor updates a deferred buffer start index associated with the long-latency operation to point to position in the deferred buffer occupied by the deferred instruction. When a long-latency operation returns, the processor executes instructions in the deferred buffer starting at the deferred buffer start index for the returning long-latency operation.