Abstract:
Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
Abstract:
A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
Abstract:
A method and apparatus are described for efficient register reclamation. For example, one embodiment of an apparatus comprises: single usage detection and tagging logic to examine a sequence of instructions to detect logical registers used by the sequence of instructions that have a single use and to tag an instruction as a single usage instruction if the instruction is a consumer of a logical register that has a single use; an allocator to allocate processor resources to execute the sequence of instructions, the processor resources including physical registers mapped to logical registers to execute the sequence of instructions; and register reclamation logic to free up a logical to physical mapping of a single use register in response to detecting the tag provided by the instruction tagging logic.
Abstract:
In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.
Abstract:
A processor includes a front end and a scheduler. The front end includes logic to determine whether to apply an acyclical or cyclical thread assignment scheme to code received at the processor, and to, based upon a determined thread assignment scheme, assign code to a static logical thread and to a rotating logical thread. The scheduler includes logic to assign the static logical thread to the same physical thread upon a subsequent control flow execution of the static logical thread, and to assign the rotating logical thread to different physical threads upon different executions of instructions in the rotating logical thread.
Abstract:
Systems, methods, and devices for original code emulation for performance monitoring is provided. A system may memory to store instructions. A processor may implement an instruction converter in hardware or software to convert the instructions to translated code. Specifically, the instruction converter receives the instructions and translates the stored instructions into the translated code that includes one or more indexed instructions. The one or more indexed instructions include a field indicating a number of branches in the stored instructions that are taken in the translated code.
Abstract:
Systems, apparatuses and methods may provide for developer stage technology that embeds binary code into an application binary file, wherein the binary code corresponds to vector functions and non-vector functions in statically typed source code, and generates intermediate representation (IR) data, wherein the intermediate representation data corresponds to the vector functions in the statically typed source code. Additionally, the developer stage technology embeds the IR data in the application binary file. Moreover, deployment stage technology may generate a first compilation output based on the application binary file and detect a capability change in an execution environment associated with the first compilation output. The deployment stage technology may also generate, in response to the detected capability change, a second compilation output based on the first compilation output.
Abstract:
A processor comprising an instruction execution circuit to execute a translated code generated based on a received code and a translation table (TT) controller circuit coupled to a translation table comprising a plurality of address mappings, wherein the TT controller circuit is to identify a trigger event associated with a physical memory page, determine, based on an identifier of the physical memory page, an entry in a manifest table, the entry comprising an address mapping between a first memory address within an address range comprising the physical memory page and a second memory address, and store the address mapping to the translation table.
Abstract:
Apparatus and method for detecting and recovering from incorrect memory dependence speculation in an out-of-order processor are described herein. For example, one embodiment of a method comprises: executing a first load instruction; detecting when the first load instruction experiences a bad store-to-load forwarding event during execution; tracking the occurrences of bad store-to-load forwarding event experienced by the first load instruction during execution; controlling enablement of an S-bit in the first load instruction based on the tracked occurrences; generating a plurality of load operations responsive to an enabled S-bit in first load instruction, wherein execution of the plurality of load operations produces a result equivalent to that from the execution of the first load instruction.
Abstract:
A processing device includes a store instruction identification unit to identify a pair of store instructions among a plurality of instructions in an instruction queue. The pair of store instructions include a first store instruction and a second store instruction. The first data of the first store instruction corresponds to a first memory region adjacent to a second memory region, and a second data of the second store instruction corresponds to the second memory region. The processing device to include a store instruction fusion unit to fuse the first store instruction with the second store instruction resulting in a fused store instruction.