摘要:
Methods and apparatuses relating to selectively executing a commit instruction. In one embodiment, a data storage device stores code that when executed by a hardware processor causes the hardware processor to perform the following: translating an instruction into a translated instruction to be executed by the hardware processor, marking a commit instruction one of for execution and for optional execution by the hardware processor, and including a hint for a commit instruction marked for optional execution; and a hardware commit unit to determine if the commit instruction marked for optional execution is to be executed based on the hint.
摘要:
A processor includes a front end and a scheduler. The front end includes logic to determine whether to apply an acyclical or cyclical thread assignment scheme to code received at the processor, and to, based upon a determined thread assignment scheme, assign code to a static logical thread and to a rotating logical thread. The scheduler includes logic to assign the static logical thread to the same physical thread upon a subsequent control flow execution of the static logical thread, and to assign the rotating logical thread to different physical threads upon different executions of instructions in the rotating logical thread.
摘要:
Methods, apparatus, systems and articles of manufacture to perform region formation for usage by a dynamic binary translation are disclosed. An example apparatus includes an initial region former to form an initial region starting at a first block of hot code of a control flow graph. The initial region former also adds blocks of hot code lying on a first hottest path of the control flow graph. A region extender extends the initial region to form an extended region including the initial region. The extended region begins at a hottest exit of the initial region and includes blocks of hot code lying on a second hottest path until one of a threshold path length has been satisfied or a back edge of the control flow graph is added to the extended region. A region pruner prunes the remove all loop nests except a selected loop nest which forms a final region.
摘要:
In one example a processor includes a region formation engine to identify a region of code for translation from a guest instruction set architecture to a native instruction set architecture. The processor also includes a binary translator to translate the region of code. The region formation engine is to perform aggressive region formation, which includes forming a region across a boundary of a return instruction. The translated region of code is to prevent a side entry into the translated region of code at a translated return target instruction included in the translated region of code. In more specific examples, performing aggressive region formation includes a region formation grow phase and a region formation cleanup phase. In the grow phase priority may be given to growing complete paths from a call target to a corresponding return. The region formation cleanup phase may comprise eliminating call targets that are not reachable.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed herein. An example apparatus includes an instruction profiler to identify a predicated block within instructions to be executed by a hardware processor. The example apparatus includes a performance monitor to access a mis-prediction statistic based on an instruction address associated with the predicated block. The example apparatus includes a region former to, in response to determining that the mis-prediction statistic is above a mis-prediction threshold, include the predicated block in a predicated region for optimization.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed to validate translated guest code in a dynamic binary translator. An example apparatus disclosed herein includes a translator to generate a first translation of code to execute on a host machine, the first translation of the guest code to facilitate creating a first translated guest code, and the translator to generate a second translation of the translated guest code to execute on the host machine. The example apparatus also includes a translation versions manager to identify a first host machine state based on executing a portion of the first translation, and the translation versions manager to identify a second host machine state based on executing a portion of the second translation. The example system also includes a validator to determine a state divergence status of the second translation based on a comparison between the first host machine state and the second host machine state.
摘要:
In one example a processor includes a region formation engine to identify a region of code for translation from a guest instruction set architecture to a native instruction set architecture. The processor also includes a binary translator to translate the region of code. The region formation engine is to perform aggressive region formation, which includes forming a region across a boundary of a return instruction. The translated region of code is to prevent a side entry into the translated region of code at a translated return target instruction included in the translated region of code. In more specific examples, performing aggressive region formation includes a region formation grow phase and a region formation cleanup phase. In the grow phase priority may be given to growing complete paths from a call target to a corresponding return. The region formation cleanup phase may comprise eliminating call targets that are not reachable.
摘要:
A method and apparatus are described for efficient register reclamation. For example, one embodiment of an apparatus comprises: single usage detection and tagging logic to examine a sequence of instructions to detect logical registers used by the sequence of instructions that have a single use and to tag an instruction as a single usage instruction if the instruction is a consumer of a logical register that has a single use; an allocator to allocate processor resources to execute the sequence of instructions, the processor resources including physical registers mapped to logical registers to execute the sequence of instructions; and register reclamation logic to free up a logical to physical mapping of a single use register in response to detecting the tag provided by the instruction tagging logic.
摘要:
Methods, apparatus, systems and articles of manufacture are disclosed herein. An example apparatus includes an instruction profiler to identify a predicated block within instructions to be executed by a hardware processor. The example apparatus includes a performance monitor to access a mis-prediction statistic based on an instruction address associated with the predicated block. The example apparatus includes a region former to, in response to determining that the mis-prediction statistic is above a mis-prediction threshold, include the predicated block in a predicated region for optimization.
摘要:
A processor includes a front end and a scheduler. The front end includes logic to determine whether to apply an acyclical or cyclical thread assignment scheme to code received at the processor, and to, based upon a determined thread assignment scheme, assign code to a static logical thread and to a rotating logical thread. The scheduler includes logic to assign the static logical thread to the same physical thread upon a subsequent control flow execution of the static logical thread, and to assign the rotating logical thread to different physical threads upon different executions of instructions in the rotating logical thread.