Abstract:
A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
Abstract:
Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
Abstract:
An apparatus includes a binary translator to hoist a load instruction in a branch of a conditional statement above the conditional statement and insert a speculation control of load (SCL) instruction in a complementary branch of the conditional statement, where the SCL instruction provides an indication of a real program order (RPO) of the load instruction before the load instruction was hoisted. The apparatus further includes an execution circuit to execute the load instruction to perform a load and cause an entry for the load instruction to be inserted in an ordering buffer, and where the execution circuit is to execute the SCL instruction to locate the entry for the load instruction in the ordering buffer using the RPO of the load instruction provided by the SCL instruction and discard the entry for the load instruction from the ordering buffer.
Abstract:
An apparatus includes a first circuit to determine a real program order (RPO) of an eldest undispatched instruction from among a plurality of strands, a second circuit to determine an RPO limit based on a delta value and the RPO of the eldest undispatched instruction, an ordering buffer to store entries for instructions that are waiting to be retired, and a third circuit to execute an orderable instruction from a strand from the plurality of strands to cause an entry for the orderable instruction to be inserted into the ordering buffer in response to a determination that an RPO of the orderable instruction is less than or equal to the RPO limit.
Abstract:
A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
Abstract:
Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache.
Abstract:
Embodiments described herein relate to apparatus and methods for decomposing loops to improve performance and power efficiency. In one embodiment, a processor includes: a loop accelerator including a plurality of strand execution circuits, a binary translator to: receive a plurality of instructions from an instruction storage, to determine whether the plurality of instructions include loop instructions, and, in response to determining that they do, to divide the loop instructions into two or more jobs using at least one job creation rule, to assign the two or more jobs to two or more strands using at least one strand creation rule, and to cause the loop accelerator to execute at least two of the two or more strands in parallel using the plurality of strand execution circuits.
Abstract:
A processing device comprises select logic to schedule a plurality of instructions for execution. The select logic calculates a reconstructed program order (RPO) value for each of a plurality of instructions that are ready to be scheduled for execution. The select logic creates an ordered list of instructions based on the delayed RPO values, the delayed RPO values comprising the calculated RPO values from a previous execution cycle, and dispatches instructions for scheduling based on the ordered list.
Abstract:
In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.
Abstract:
Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache.