摘要:
One embodiment of the present invention provides a system that selectively monitors store instructions to support transactional execution of a process, wherein changes made during the transactional execution are not committed to the architectural state of a processor until the transactional execution successfully completes. Upon encountering a store instruction during transactional execution of a block of instructions, the system determines whether the store instruction is a monitored store instruction or an unmonitored store instruction. If the store instruction is a monitored store instruction, the system performs the store operation, and store-marks a cache line associated with the store instruction to facilitate subsequent detection of an interfering data access to the cache line from another process. If the store instruction is an unmonitored store instruction, the system performs the store operation without store-marking the cache line.
摘要:
Mechanisms have been developed for providing great flexibility in processor instruction handling, sequencing and execution. In particular, it has been discovered that a programmable pre-decode mechanism can be employed to alter the behavior of a processor. For example, pre-decode hints for sequencing, synchronization or speculation control may altered or mappings of ISA instructions to native instructions or operation sequences may be altered. Such techniques may be employed to adapt a processor implementation (in the field) to varying memory models, implementations or interfaces or to varying memory latencies or timing characteristics. Similarly, such techniques may be employed to adapt a processor implementation to correspond to an extended/adapted instruction set architecture. In some realizations, instruction pre-decode functionality may be adapted at processor run-time to handle or mitigate a timing, concurrency or speculation issue. In some realizations, operation of pre-decode may be reprogrammed post-manufacture, at (or about) initialization, or at run-time.
摘要:
One embodiment of the present invention provides a system that facilitates efficient join operations between a head thread and a speculative thread during speculative program execution, wherein the head thread executes program instructions and the speculative thread executes program instructions in advance of the head thread. The system operates by executing a primary version of a program using the head thread, and by executing a speculative version of the program using the speculative thread. When the head thread reaches a point in the program where the speculative thread began executing, the system performs a join operation between the head thread and the speculative thread. This join operation causes the speculative thread to act as a new head thread by switching from executing the speculative version of the program to executing the primary version of the program. To facilitate this switching operation, the system performs a lookup to determine where the new head thread is to commence executing within the primary version of the program based upon where the speculative thread is currently executing within the speculative version of the program.
摘要:
One embodiment of the present invention provides a system that facilitates inter-processor communication and synchronization through a hardware message buffer, which includes a plurality of physical channels that are structured as queues for communicating between processors in a multiprocessor system. The system operates by receiving an instruction to perform a data transfer operation through the hardware message buffer, wherein the instruction specifies a virtual channel to which the data transfer operation is directed. Next, the system translates the virtual channel into a physical channel, and then performs the data transfer operation on the physical channel within the hardware message buffer. In one embodiment of the present invention, if the data transfer operation is a store operation and the physical channel is already full, the system returns status information indicating that the physical channel is too full to perform the store operation. In one embodiment of the present invention, if the data transfer operation is a load operation and the physical channel is empty, the system returns status information indicating that the physical channel is empty and the load operation cannot be completed.
摘要:
One embodiment of the present invention provides a system that facilitates selectively unmarking load-marked cache lines during transactional program execution, wherein load-marked cache lines are monitored during transactional execution to detect interfering accesses from other threads. During operation, the system encounters a release instruction during transactional execution of a block of instructions. In response to the release instruction, the system modifies the state of cache lines, which are specially load-marked to indicate they can be released from monitoring, to account for the release instruction being encountered. In doing so, the system can potentially cause the specially load-marked cache lines to become unmarked. In a variation on this embodiment, upon encountering a commit-and-start-new-transaction instruction, the system modifies load-marked cache lines to account for the commit-and-start-new-transaction instruction being encountered. In doing so, the system causes normally load-marked cache lines to become unmarked, while other specially load-marked cache lines may remain load-marked past the commit-and-start-new-transaction instruction.
摘要:
One embodiment of the present invention provides a system which selectively executes deferred instructions following a return of a long-latency operation in a processor that supports speculative-execution. During normal-execution mode, the processor issues instructions for execution in program order. When the processor encounters a long-latency operation, such as a load miss, the processor records the long-latency operation in a long-latency scoreboard, wherein each entry in the long-latency scoreboard includes a deferred buffer start index. Upon encountering an unresolved data dependency during execution of an instruction, the processor performs a checkpointing operation and executes subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of the unresolved data dependency are deferred into a deferred buffer, and wherein other non-deferred instructions are executed in program order. Upon encountering a deferred instruction that depends on a long-latency operation within the long-latency scoreboard, the processor updates a deferred buffer start index associated with the long-latency operation to point to position in the deferred buffer occupied by the deferred instruction. When a long-latency operation returns, the processor executes instructions in the deferred buffer starting at the deferred buffer start index for the returning long-latency operation.
摘要:
One embodiment of the present invention provides a system that facilitates delaying interfering memory accesses from other threads during transactional execution. During transactional execution of a block of instructions, the system receives a request from another thread (or processor) to perform a memory access involving a cache line. If performing the memory access on the cache line will interfere with the transactional execution and if it is possible to delay the memory access, the system delays the memory access and stores copy-back information for the cache line to enable the cache line to be copied back to the requesting thread. At a later time, when the memory access will no longer interfere with the transactional execution, the system performs the memory access and copies the cache line back to the requesting thread.
摘要:
One embodiment of the present invention provides a system that avoids read-after-write (RAW) hazards while speculatively executing instructions on a processor. The system starts in a normal execution mode, wherein the system issues instructions for execution in program order. Upon encountering a stall condition during execution of an instruction, the system generates a checkpoint, and executes the instruction and subsequent instructions in a speculative-execution mode. The system also maintains dependency information for each register indicating whether or not a value in the register depends on an unresolved data-dependency. The system uses this dependency information to avoid RAW hazards during the speculative-execution mode.
摘要:
Techniques have been developed whereby likely pointer values are identified at runtime and contents of corresponding storage location can be prefetched into a cache hierarchy to reduce effective memory access latencies. In some realizations, one or more writable stores are defined in a processor architecture to delimit a portion or portions of a memory address space.
摘要:
One embodiment of the present invention provides a system that facilitates deferring execution of instructions with unresolved data dependencies as they are issued for execution in program order. During a normal execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system generates a checkpoint that can subsequently be used to return execution of the program to the point of the instruction. Next, the system executes subsequent instructions in an execute-ahead mode, wherein instructions that cannot be executed because of an unresolved data dependency are deferred, and wherein other non-deferred instructions are executed in program order.