摘要:
Embodiments of the present invention relate to a system and method for comparatively increasing processor throughput and relieving pressure on the processor's scheduler and register file by diverting instructions dependent on long-latency operations from a flow of the processor pipeline and re-introducing the instructions into the flow when the long-latency operations are completed. In this way, the instructions do not tie up resources and overall instruction throughput in the pipeline is comparatively increased. Before the instructions are diverted from the pipeline, they may undergo a conventional process to map logical registers of the instructions to physical registers. Before the instructions are re-introduced into the pipeline, the physical registers mapped according to the conventional process may be re-mapped to other physical registers, thereby efficiently preserving correct program sequence information.
摘要:
One embodiment of the present invention provides a system that facilitates avoiding locks by speculatively executing critical sections of code. During operation, the system allows a process to speculatively execute a critical section of code within a program without first acquiring a lock associated with the critical section. If the process subsequently completes the critical section without encountering an interfering data access from another process, the system commits changes made during the speculative execution, and resumes normal non-speculative execution of the program past the critical section. Otherwise, if an interfering data access from another process is encountered during execution of the critical section, the system discards changes made during the speculative execution, and attempts to re-execute the critical section.
摘要:
Methods and apparatus to provide unbounded transactional memory systems are described. In one embodiment, an operation corresponding to a software transactional memory (STM) access may be executed if a preceding hardware transactional memory (HTM) access operation fails.
摘要:
A method and apparatus for setting aside a long-latency micro-operation from a reorder buffer is disclosed. In one embodiment, a long-latency micro-operation would conventionally stall a reorder buffer. Therefore a secondary buffer may be used to temporarily store that long-latency micro-operation, and other micro-operations depending from it, until that long-latency micro-operation is ready to execute. These micro-operations may then be reintroduced into the reorder buffer for execution. The use of poisoned bits may be used to ensure correct retirement of register values merged from both pre- and post-execution of the micro-operations which were set aside in the secondary buffer.
摘要:
Critical sections of a program, providing exclusive access to shared data in a multi-processor architecture may be inferred from standard instructions and used to invoke a cache protocol that delays the response of requests of other cache and thus counter intuitively improving performance of the system. During this delay, read-only copies of data may be provided and the delay may recognize two priorities of requests, one of which is not delayed so as to improve the release of locks held by different processors.
摘要:
A processor includes a core and a prefetcher. The prefetcher includes logic to issue a request for data including a requested prefetch. The core includes logic to receive an indication of the request, determine whether the request is for a restricted region of memory, and, based upon whether the request is for the restricted region of memory, allow or deny the request.
摘要:
Methods and apparatus for throttling an interface that is integrated on the same die as a processor are described. In one embodiment, a signal from an Integrated Input/Output hub (e.g., integrated on the same die as a processor) causes throttling of a link coupled between the IIO and an Input/Output (IO) device. Other embodiments are also disclosed.
摘要:
Methods and apparatus for throttling an interface that is integrated on the same die as a processor are described. In one embodiment, a signal from an Integrated Input/Output hub (e.g., integrated on the same die as a processor) causes throttling of a link coupled between the IIO and an Input/Output (IO) device. Other embodiments are also disclosed.
摘要:
A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.
摘要:
Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.