摘要:
One embodiment of the present invention supports execution of a start transactional execution (STE) instruction, which marks the beginning of a block of instructions to be executed transactionally. Upon encountering the STE instruction during execution of a program, the system commences transactional execution of the block of instructions following the STE instruction. Changes made during this transactional execution are not committed to the architectural state of the processor until the transactional execution successfully completes.
摘要:
One embodiment of the present invention provides a system that facilitates executing a memory barrier (membar) instruction in an execute-ahead processor, wherein the membar instruction forces buffered loads and stores to complete before allowing a following instruction to be issued. During operation in a normal-execution mode, the processor issues instructions for execution in program order. Upon encountering a membar instruction, the processor determines if the load buffer and store buffer contain unresolved loads and stores. If so, the processor defers the membar instruction and executes subsequent program instructions in execute-ahead mode. In execute-ahead mode, instructions that cannot be executed because of an unresolved data dependency are deferred, and other non-deferred instructions are executed in program order. When all stores and loads that precede the membar instruction have been committed to memory from the store buffer and the load buffer, the processor enters a deferred mode and executes the deferred instructions, including the membar instruction, in program order. If all deferred instructions have been executed, the processor returns to the normal-execution mode and resumes execution from the point where the execute-ahead mode left off.
摘要:
One embodiment of the present invention provides a processor that facilitates rapid progress while speculatively executing instructions in scout mode. During normal operation, the processor executes instructions in a normal execution mode. Upon encountering a stall condition, the processor executes the instructions in a scout mode, wherein the instructions are speculatively executed to prefetch future loads, but wherein results are not committed to the architectural state of the processor. While speculatively executing the instructions in scout mode, the processor maintains dependency information for each register indicating whether or not a value in the register depends on an unresolved data-dependency. If an instruction to be executed in scout mode depends on an unresolved data dependency, the processor executes the instruction as a NOOP so that the instruction executes rapidly without tying up computational resources. The processor also propagates dependency information indicating an unresolved data dependency to a destination register for the instruction.
摘要:
A user is provided with means to sample memory hierarchy via software. This allows a user to enhance memory-level parallelism via software. A status of information needed for execution of a second computer program instruction is read in response to execution of a first computer program instruction. Execution continues with execution of the second computer program instruction upon the status being a first status. Alternatively, a third computer program instruction is executed upon the status being a second status different from the first status. Thus, execution of the first computer program instruction allows control of the memory hierarchy, which in turn give the user control of the memory hierarchy.
摘要:
One embodiment of the present invention provides a system that marks memory elements based upon how information retrieved from the memory elements affects speculative program execution. This system operates by allowing a programmer to examine source code that is to be compiled into executable code for a head thread that executes program instructions, and for a speculative thread that executes program instructions in advance of the head thread. During read operations to memory elements by the speculative thread, this executable code generally causes the speculative thread to update status information associated with the memory elements to indicate that the memory elements have been read by the speculative thread. Next, the system allows the programmer to identify a given read operation directed to a given memory element, wherein a given value retrieved from the given memory element during the given read operation does not affect subsequent execution of the speculative thread. The programmer is then allowed to insert a hint into the source code specifying that the speculative thread is not to update status information during the given read operation directed to the given memory element. Next, the system compiles the source code, including the hint, into the executable code, so that during the given read operation, the executable code does not cause the speculative thread to update status information associated with the given memory element to indicate that the given memory element has been read by the speculative thread.
摘要:
One embodiment of the present invention provides a system that supports exception handling through use of a conditional trap instruction. The system supports a head thread that executes program instructions and a speculative thread that speculatively executes program instructions in advance of the head thread. During operation, the system uses the speculative thread to execute code, which includes an instruction that can cause an exception condition. After the instruction is executed, the system determines if the instruction caused the exception condition. If so, the system writes an exception condition indicator to a register. At some time in the future, the system executes a conditional trap instruction which examines a value in the register. If the value in the register is an exception condition indicator, the system executes a trap handling routine to handle the exception condition. Otherwise, the system proceeds with execution of the code. In one embodiment of the present invention, prior to executing the instruction, the system allows a compiler to optimize a program containing the instruction. This optimization process includes scheduling an exception testing instruction associated with the instruction to occupy a free instruction slot following the instruction. This exception testing instruction determines if the instruction causes the exception condition. In one embodiment of the present invention, the trap handling routine triggers a rollback operation to undo operations performed by the speculative thread.
摘要:
One embodiment of the present invention provides a system that enforces dependencies between memory references within a load store unit (LSU) in a processor. When a write request is received in the load store unit, the write request is loaded into a store buffer in the LSU. The write request may include a “watch address” specifying that a subsequent load from the watch address cannot occur before the write request completes. Note that the watch address is not necessarily the same as the destination address of the write operation. When a read request is received in the load store unit, the read request is loaded into a load buffer of the LSU. The system determines if the read request is directed to the same address as a matching watch address in the store buffer. If so, the system waits for the write request associated with the matching watch address to complete before completing the read request. In one embodiment of the present invention, if the read request is directed to the same address as a matching write request in the store buffer, the system completes the read request by returning a data value contained in the matching write request without going out to memory. In one embodiment of the present invention, the system provides an executable code write instruction that specifies the watch address.
摘要:
One embodiment of the present invention provides a processor that supports multiple-issue execution. This processor includes a register file, which contains an array of memory cells, wherein the memory cells contain bits for architectural registers of the processor. The register file also includes multiple read ports and multiple write ports to support multiple-issue execution. During operation, if multiple read ports simultaneously read from a given register, the register file is configured to: read each bit of the given register out of the array of memory cells through a single bitline associated with the bit; and to use a driver located outside of the array of memory cells to drive the bit to the multiple read ports. In this way, each memory cell only has to drive a single bitline (instead of multiple bitlines) during a multiple-port read operation, thereby allowing memory cells to use smaller and more power-efficient drivers for read operations.
摘要:
One embodiment of the present invention provides a system that synchronizes threads on a multi-threaded processor. The system starts by executing instructions from a multi-threaded program using a first thread and a second thread. When the first thread reaches a predetermined location in the multi-threaded program, the first thread executes a Start-Transactional-Execution (STE) instruction to commence transactional execution, wherein the STE instruction specifies a location to branch to if transactional execution fails. During the subsequent transactional execution, the first thread accesses a mailbox location in memory (which is also accessible by the second thread) and then executes instructions that cause the first thread to wait. When the second thread reaches a second predetermined location in the multi-threaded program, the second thread signals the first thread by accessing the mailbox location, which causes the transactional execution of the first thread to fail, thereby causing the first thread to resume non-transactional execution from the location specified in the STE instruction. In this way, the second thread can signal to the first thread without the first thread having to poll a shared variable.
摘要:
Embodiments of the present invention provide a system that executes program code in a processor. The system starts by executing the program code in a normal mode using a primary strand while concurrently executing the program code ahead of the primary strand using a subordinate strand in a scout mode. Upon resolving a branch using the subordinate strand, the system records a resolution for the branch in a speculative branch resolution table. Upon subsequently encountering the branch using the primary strand, the system uses the recorded resolution from the speculative branch resolution table to predict a resolution for the branch for the primary strand. Upon determining that the resolution of the branch was mispredicted for the primary strand, the system determines that the subordinate strand mispredicted the branch. The system then recovers the subordinate strand to the branch and restarts the subordinate strand executing the program code.