摘要:
Embodiments of the present invention provide a system that executes a transaction on a simultaneous speculative threading (SST) processor. In these embodiments, the processor includes a primary strand and a subordinate strand. Upon encountering a transaction with the primary strand while executing instructions non-transactionally, the processor checkpoints the primary strand and executes the transaction with the primary strand while continuing to non-transactionally execute deferred instructions with the subordinate strand. When the subordinate strand non-transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate the first strand ID. When the primary strand transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate a second strand ID.
摘要:
A method for determining whether to store binary information in a fast way or a slow way of a cache is disclosed. The method includes receiving a block of binary information to be stored in a cache memory having a plurality of ways. The plurality of ways includes a first subset of ways and a second subset of ways, wherein a cache access by a first execution core from one of the first subset of ways has a lower latency time than a cache access from one of the second subset of ways. The method further includes determining, based on a predetermined access latency and one or more parameters associated with the block of binary information, whether to store the block of binary information into one of the first set of ways or one of the second set of ways.
摘要:
A technique for coordinating execution of instructions in a processor that allows instructions to execute out-of-order includes decoding a particular instruction that is defined in accordance with an instruction set of the processor. A helper sequence of instructions that corresponds to the particular instruction is then introduced into a stream of executable operations. The corresponding helper sequence includes a first artificial dependency instruction that codes a dependency on a register that is not actually employed as a register source or target for an operation performed by the particular instruction.
摘要:
Embodiments of the present invention execute an anti-prefetch instruction. These embodiments start by decoding instructions in a decode unit in a processor to prepare the instructions for execution. Upon decoding an anti-prefetch instruction, these embodiments stall the decode unit to prevent decoding subsequent instructions. These embodiments then execute the anti-prefetch instruction, wherein executing the anti-prefetch instruction involves: (1) sending a prefetch request for a cache line in an L1 cache; (2) determining if the prefetch request hits in the L1 cache; (3) if the prefetch request hits in the L1 cache, determining if the cache line contains a predetermined value; and (4) conditionally performing subsequent operations based on whether the prefetch request hits in the L1 cache or the value of the data in the cache line.
摘要:
Embodiments of the present invention provide a system that executes transactions on a processor that supports transactional memory. The system starts by executing the transaction on the processor. During execution of the transactions, the system places stores in a store buffer. In addition, the system sets a stores_encountered indicator when a first store is placed in the store buffer during the transaction. Upon completing the transaction, the system determines if the stores_encountered indicator is set. If so, the system signals a cache to commit the stores placed in the store buffer during the transaction to the cache and then resumes execution of program code following the transaction when the stores have been committed. Otherwise, the system resumes execution of program code following the transaction without signaling the cache.
摘要:
One embodiment of the present invention provides a system that counts speculatively-executed instructions for performance analysis purposes. During operation, the system counts instructions which are normally executed during a normal-execution mode. Next, the system enters a speculative-execution mode wherein instructions are speculatively executed without being committed to the architectural state of the processor. During the speculative-execution mode, the system counts the speculatively-executed instructions in a manner that enables the count of speculatively-executed instructions to be reset if the speculative execution fails.
摘要:
One embodiment of the present invention provides a system that avoids register read-after-write (RAW) hazards upon returning from a speculative-execution mode. This system operates within a processor with an in-order architecture, wherein the processor includes a short-latency scoreboard that delays issuance of instructions that depend upon uncompleted short-latency instructions. During operation, the system issues instructions for execution in program order during execution of a program in a normal-execution mode. Upon encountering a condition (a launch condition) during an instruction (a launch-point instruction), which causes the processor to enter the speculative-execution mode, the system generates a checkpoint that can subsequently be used to return execution of the program to the launch-point instruction, and commences execution in the speculative-execution mode. Upon encountering a condition that causes the processor to leave the speculative-execution mode and return to the launch-point instruction, the system uses the checkpoint to resume execution in the normal-execution mode from the launch-point instruction. In doing so, the system ensures that entries that were in the short-latency scoreboard prior to entering the speculative-execution mode, and which are not yet resolved, are accounted for in order to prevent register RAW hazard when resuming execution from the launch-point instruction.
摘要:
Techniques and structures are disclosed for a processor supporting checkpointing to operate effectively in scouting mode while a maximum number of supported checkpoints are active. Operation in scouting mode may include using bypass logic and a set of register storage locations to store and/or forward in-flight instruction results that were calculated during scouting mode. These forwarded results may be used during scouting mode to calculate memory load addresses for yet other in-flight instructions, and the processor may accordingly cause data to be prefetched from these calculated memory load addresses. The set of register storage locations may comprise a working register file or an active portion of a multiported register file.
摘要:
One embodiment of the present invention provides a system that merges checkpoints on a processor. The system starts by executing instructions speculatively during a speculative-execution episode. The system then generates a first checkpoint and a second checkpoint during the speculative-execution episode. Next, the system merges the first checkpoint with the second checkpoint during the speculative-execution episode, wherein merging the first and second checkpoints conserves processor resources.
摘要:
Embodiments of the present invention provide a system that executes transactions on a processor that supports transactional memory. The system starts by executing the transaction on the processor. During execution of the transactions, the system places stores in a store buffer. In addition, the system sets a stores_encountered indicator when a first store is placed in the store buffer during the transaction. Upon completing the transaction, the system determines if the stores_encountered indicator is set. If so, the system signals a cache to commit the stores placed in the store buffer during the transaction to the cache and then resumes execution of program code following the transaction when the stores have been committed. Otherwise, the system resumes execution of program code following the transaction without signaling the cache.