Abstract:
One embodiment of the present invention provides a system that enforces memory-reference ordering requirements at an L2 cache. During operation, the system receives a load at the L2 cache, wherein the load previously caused a miss at an L1 cache. Upon receiving the load, the system performs a lookup for the load in reflections of store buffers associated with other L1 caches. These reflections are located at the L2 cache, and each reflection contains addresses for stores in a corresponding store buffer associated with an L1 cache, and possibly contains data that was overwritten by the stores. If the lookup generates a hit, which indicates that the load may potentially interfere with a store, the system causes the load to wait to execute until the store commits.
Abstract:
One embodiment of the present invention provides a system which supports out-of-order issue in a processor that normally executes instructions in-order. The system starts by issuing instructions from an issue queue in program order during a normal-execution mode. While issuing the instructions, the system determines if any instruction in the issue queue has an unresolved short-latency data dependency which depends on a short-latency operation. If so, the system generates a checkpoint and enters an out-of-order-issue mode, wherein instructions in the issue queue with unresolved short-latency data dependencies are held and not issued, and wherein other instructions in the issue queue without unresolved data dependencies are allowed to issue out-of-order.
Abstract:
Explicit software control is used for data speculations. The explicit software control is applied at selected locations in a computer program to provide the benefit of data speculation while eliminating the need for hardware to perform data speculation. A computer-based method first determines, via explicit software control, whether data speculation for an item, a variable, a pointer, an address, etc., is needed. Upon determining that data speculation for the item is needed, the data speculation is performed under explicit software control. Conversely, if the explicit software control determines that data speculation is not needed, e.g., the value of the item typically obtained by execution of a long latency instruction, is available, an original code segment is executed using an actual value of the item.
Abstract:
One embodiment of the present invention provides a system that facilitates deferring execution of instructions with unresolved data dependencies as they are issued for execution in program order. During a normal execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system generates a checkpoint that can subsequently be used to return execution of the program to the point of the instruction. Next, the system executes subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of an unresolved data dependency are deferred, and wherein other non-deferred instructions are executed in program order.
Abstract:
One embodiment of the present invention supports execution of a start transactional execution (STE) instruction, which marks the beginning of a block of instructions to be executed transactionally. Upon encountering the STE instruction during execution of a program, the system commences transactional execution of the block of instructions following the STE instruction. Changes made during this transactional execution are not committed to the architectural state of the processor until the transactional execution successfully completes.
Abstract:
One embodiment of the present invention provides a system that facilitates executing a memory barrier (membar) instruction in an execute-ahead processor, wherein the membar instruction forces buffered loads and stores to complete before allowing a following instruction to be issued. During operation in a normal-execution mode, the processor issues instructions for execution in program order. Upon encountering a membar instruction, the processor determines if the load buffer and store buffer contain unresolved loads and stores. If so, the processor defers the membar instruction and executes subsequent program instructions in execute-ahead mode. In execute-ahead mode, instructions that cannot be executed because of an unresolved data dependency are deferred, and other non-deferred instructions are executed in program order. When all stores and loads that precede the membar instruction have been committed to memory from the store buffer and the load buffer, the processor enters a deferred mode and executes the deferred instructions, including the membar instruction, in program order. If all deferred instructions have been executed, the processor returns to the normal-execution mode and resumes execution from the point where the execute-ahead mode left off.
Abstract:
One embodiment of the present invention provides a processor that facilitates rapid progress while speculatively executing instructions in scout mode. During normal operation, the processor executes instructions in a normal execution mode. Upon encountering a stall condition, the processor executes the instructions in a scout mode, wherein the instructions are speculatively executed to prefetch future loads, but wherein results are not committed to the architectural state of the processor. While speculatively executing the instructions in scout mode, the processor maintains dependency information for each register indicating whether or not a value in the register depends on an unresolved data-dependency. If an instruction to be executed in scout mode depends on an unresolved data dependency, the processor executes the instruction as a NOOP so that the instruction executes rapidly without tying up computational resources. The processor also propagates dependency information indicating an unresolved data dependency to a destination register for the instruction.
Abstract:
A user is provided with means to sample memory hierarchy via software. This allows a user to enhance memory-level parallelism via software. A status of information needed for execution of a second computer program instruction is read in response to execution of a first computer program instruction. Execution continues with execution of the second computer program instruction upon the status being a first status. Alternatively, a third computer program instruction is executed upon the status being a second status different from the first status. Thus, execution of the first computer program instruction allows control of the memory hierarchy, which in turn give the user control of the memory hierarchy.
Abstract:
One embodiment of the present invention provides a system that marks memory elements based upon how information retrieved from the memory elements affects speculative program execution. This system operates by allowing a programmer to examine source code that is to be compiled into executable code for a head thread that executes program instructions, and for a speculative thread that executes program instructions in advance of the head thread. During read operations to memory elements by the speculative thread, this executable code generally causes the speculative thread to update status information associated with the memory elements to indicate that the memory elements have been read by the speculative thread. Next, the system allows the programmer to identify a given read operation directed to a given memory element, wherein a given value retrieved from the given memory element during the given read operation does not affect subsequent execution of the speculative thread. The programmer is then allowed to insert a hint into the source code specifying that the speculative thread is not to update status information during the given read operation directed to the given memory element. Next, the system compiles the source code, including the hint, into the executable code, so that during the given read operation, the executable code does not cause the speculative thread to update status information associated with the given memory element to indicate that the given memory element has been read by the speculative thread.
Abstract:
One embodiment of the present invention provides a system that supports exception handling through use of a conditional trap instruction. The system supports a head thread that executes program instructions and a speculative thread that speculatively executes program instructions in advance of the head thread. During operation, the system uses the speculative thread to execute code, which includes an instruction that can cause an exception condition. After the instruction is executed, the system determines if the instruction caused the exception condition. If so, the system writes an exception condition indicator to a register. At some time in the future, the system executes a conditional trap instruction which examines a value in the register. If the value in the register is an exception condition indicator, the system executes a trap handling routine to handle the exception condition. Otherwise, the system proceeds with execution of the code. In one embodiment of the present invention, prior to executing the instruction, the system allows a compiler to optimize a program containing the instruction. This optimization process includes scheduling an exception testing instruction associated with the instruction to occupy a free instruction slot following the instruction. This exception testing instruction determines if the instruction causes the exception condition. In one embodiment of the present invention, the trap handling routine triggers a rollback operation to undo operations performed by the speculative thread.