Abstract:
In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed.
Abstract:
A computer-implemented method for managing loop code in a compiler includes using a conflict detection procedure that detects across-iteration dependency for arrays of single memory addresses to determine whether a potential across-iteration dependency exists for arrays of memory addresses for ranges of memory accessed by the loop code.
Abstract:
A processor of an aspect includes a decode unit to decode a persistent store fence instruction. The processor also includes a memory subsystem module coupled with the decode unit. The memory subsystem module, in response to the persistent store fence instruction, is to ensure that a given data corresponding to the persistent store fence instruction is stored persistently in a persistent storage before data of all subsequent store instructions is stored persistently in the persistent storage. The subsequent store instructions occur after the persistent store fence instruction in original program order. Other processors, methods, systems, and articles of manufacture are also disclosed.
Abstract:
Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.
Abstract:
Methods and apparatus relating to conjugate code generation for efficient dynamic optimizations are described. In an embodiment, a binary code and an intermediate representation (IR) code are generated based at least partially on a source program. The binary code and the intermediate code are transmitted to a virtual machine logic. The binary code and the IR code each include a plurality of regions that are in one-to-one correspondence. Other embodiments are also claimed and described.
Abstract:
Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode.
Abstract:
Embodiments described herein utilize restricted transactional memory (RTM) instructions to implement speculative compile time optimizations that will be automatically rolled back by hardware in the event of a missed speculation. In one embodiment, a lightweight version of RTM for speculative compiler optimization is described to provide lower operational overhead in comparison to conventional RTM implementations used when performing SLE.
Abstract:
An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.
Abstract:
Example methods and apparatus to facilitate dynamic core selection are disclosed. An example apparatus includes a first processor core of a first type; a second processor core of a second type different from the first type; and software to: access a user-supplied hint indicative of a user preference to execute program code on the first processor core, the user-supplied hint including a user-defined attribute of the program code; monitor performance of the program code on the first processor core; determine, based on the user-defined attribute of the program code, a predicted performance of the program code on the second processor core is better than the performance of the program code on the first processor core; and ignore the user preference by migrating the program code from the first processor core for execution on the second processor core
Abstract:
Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core.