Abstract:
Embodiments of apparatuses, methods, and systems for inter-cluster communication of live-in register values are described. In an embodiment, a processor includes a plurality of execution clusters. The processor also includes a cache memory in which to store a value to be produced by a first execution cluster of the plurality of execution clusters and consumed by a second execution cluster of the plurality of execution clusters. The cache memory is separate from a system memory hierarchy and a register set of the processor.
Abstract:
Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
Abstract:
A processor includes a front end and a scheduler. The front end includes logic to determine whether to apply an acyclical or cyclical thread assignment scheme to code received at the processor, and to, based upon a determined thread assignment scheme, assign code to a static logical thread and to a rotating logical thread. The scheduler includes logic to assign the static logical thread to the same physical thread upon a subsequent control flow execution of the static logical thread, and to assign the rotating logical thread to different physical threads upon different executions of instructions in the rotating logical thread.
Abstract:
In one embodiment, the present invention includes a memory management unit (MMU) having entries to store virtual address to physical address translations, where each entry includes a location indicator to indicate whether a memory location for the corresponding entry is present in a local or remote memory. In this way, a common virtual memory space can be shared between the two memories, which may be separated by one or more non-coherent links. Other embodiments are described and claimed.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to validate translated guest code in a dynamic binary translator. An example apparatus disclosed herein includes a translator to generate a first translation of code to execute on a host machine, the first translation of the guest code to facilitate creating a first translated guest code, and the translator to generate a second translation of the translated guest code to execute on the host machine. The example apparatus also includes a translation versions manager to identify a first host machine state based on executing a portion of the first translation, and the translation versions manager to identify a second host machine state based on executing a portion of the second translation. The example system also includes a validator to determine a state divergence status of the second translation based on a comparison between the first host machine state and the second host machine state.
Abstract:
Methods and apparatuses relating to assigning a logical thread to a physical thread. In one embodiment, an apparatus includes a data storage device that stores code that when executed by a hardware processor causes the hardware processor to perform the following: translating an instruction into a translated instruction, assigning a logical thread for the translated instruction, and providing a thread map hint for the translated instruction; and a hardware scheduler to assign a physical thread of the hardware processor to execute the logical thread based on the thread map hint.
Abstract:
Methods and apparatuses for reducing power consumption of processor switch operations are disclosed. One or more embodiments may comprise specifying a subset of registers or state storage elements to be involved in a register or state storage operation, performing the register or state storage operation, and performing a switch operation. The embodiments may minimize the number of registers or state storage elements involved with the standby operation by specifying only the subset of registers or state storage elements, which may involve considerably fewer than the total number of registers or state storage or elements of the processor. The switch operation may be switch from one mode to another, such as a transition to or from a sleep mode, a context switch, or the execution of various types of instructions.
Abstract:
Various different embodiments of the invention are described including: (1) a method and apparatus for intelligently allocating threads within a binary translation system; (2) data cache way prediction guided by binary translation code morphing software; (3) fast interpreter hardware support on the data-side; (4) out-of-order retirement; (5) decoupled load retirement in an atomic OOO processor; (6) handling transactional and atomic memory in an out-of-order binary translation based processor; and (7) speculative memory management in a binary translation based out of order processor.
Abstract:
Apparatus and method for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to validate translated guest code in a dynamic binary translator. An example apparatus disclosed herein includes a translator to generate a first translation of code to execute on a host machine, the first translation of the guest code to facilitate creating a first translated guest code, and the translator to generate a second translation of the translated guest code to execute on the host machine. The example apparatus also includes a translation versions manager to identify a first host machine state based on executing a portion of the first translation, and the translation versions manager to identify a second host machine state based on executing a portion of the second translation. The example system also includes a validator to determine a state divergence status of the second translation based on a comparison between the first host machine state and the second host machine state.