Abstract:
In one embodiment, the present invention includes a prefetching engine to detect when data access strides in a memory fall into a range, to compute a predicted next stride, to selectively prefetch a cache line using the predicted next stride, and to dynamically control prefetching. Other embodiments are also described and claimed.
Abstract:
Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reordering load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.
Abstract:
An apparatus and method is described herein for conditionally committing /andor speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.
Abstract:
Methods and apparatuses enable on-demand instruction emulation via user-level exception handling. A non-supported instruction triggers an exception during runtime of a program. In response to the exception, a user-level or application-level exception handler is launched, instead of a kernel-level handler. Then the exception handler can execute at the application layer instead of the kernel level. The handler identifies the instruction and emulates the instruction, where emulation of the instruction is supported by the handler. Emulating the instructions enables the program to continue execution. Repeated instruction emulation is amortized via dynamic binary translation of hot code.
Abstract:
Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.
Abstract:
Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reording load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.
Abstract:
A method and apparatus for efficient register checkpointing is herein described. A transaction is detected in program code. A recovery block is inserted in the program code to perform recovery operations in response to an abort of the first transaction. A roll-back edge is potentially inserted from an abort point to the recovery block. A control flow edge is inserted from the recovery block to a entry point of the transaction. Checkpoint code is inserted before the entry point to backup live-in registers in backup storage elements and recovery code is inserted in the recovery block to restore the live-in registers from the backup storage elements in response to an abort of the transaction.
Abstract:
In a method for reducing code size a replaceable subset of instructions at a first location within a set of instructions and a matching target subset of instructions at a second location within the set of instructions are identified. A base offset and a relative offset are determined. The base offset and the relative offset indicate an absolute offset from the first location to the second location. An instruction to cause a base offset storage element to be loaded with the base offset is inserted prior to the first location. The replaceable subset of instructions is replaced with a second instruction to cause a program counter to be modified based on the relative offset and a value in the base offset register so that the modified program counter indicates the second location.
Abstract:
In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.
Abstract:
Disclosed is a method for running a first code generated by a Software-based Redundant Multi-Threading (SRMT) compiler along with a second code generated by a normal compiler at runtime, the first code including a first function and a second function, the second code including a third function. The method comprises running the first function in a leading thread and a tailing thread (104); running the third function in a single thread (106), the leading thread calls the third function and running the second function in the leading thread and the tailing thread (108), the third function calls the second function. The present disclosure provides a mechanism for handling function calls wherein SRMT functions and binary functions can call each other irrespective of whether the callee function is a SRMT function or a binary function and thereby dynamically adjusts reliability and performance tradeoff based on run-time information and user selectable policies.