Abstract:
A processing device implementing minimizing bandwidth to track return targets by an instruction tracing system is disclosed. A processing device of the disclosure an instruction fetch unit comprising a return stack buffer (RSB) to predict a target address of a return (RET) instruction corresponding to a call (CALL) instruction. The processing device further includes a retirement unit comprising an instruction tracing module to initiate instruction tracing for instructions executed by the processing device, determine whether the target address of the RET instruction was mispredicted, determine a value of call depth counter (CDC) maintained by the instruction tracing module, and when the target address of the RET instruction was not mispredicted and when the value of the CDC is greater than zero, generate an indication that the RET instruction branches to a next linear instruction after the corresponding CALL instruction.
Abstract:
A processor includes one or more execution units to execute instructions of a plurality of threads and thread control logic coupled to the execution units to predict whether a first of the plurality of threads is ready for selection in a current cycle based on readiness of instructions of the first thread in one or more previous cycles, to predict whether a second of the plurality of threads is ready for selection in the current cycle based on readiness of instructions of the second thread in the one or more previous cycles, and to select one of the first and second threads in the current cycle based on the predictions.
Abstract:
A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.
Abstract:
A method, apparatus, and system are provided for performing compare and exchange operations using a sleep-wakeup mechanism. According to one embodiment, an instruction at a processor is executed to help acquire a lock on behalf of the processor. If the lock is unavailable to be acquired by the processor, the instruction is put to sleep until an event has occurred.
Abstract:
Apparatuses and methods for dead instruction identification are disclosed. In one embodiment, an apparatus includes an instruction buffer and a dead instruction identifier. The instruction buffer is to store an instruction stream having a single entry point and a single exit point. The dead instruction identifier is to identify dead instructions based on a forward pass through the instruction stream.
Abstract:
A processor includes a logic to execute a first instruction and a second instruction. The first instruction is ordered before the second instruction. Each instruction references a respective logical register assigned to a respective physical register. The processor also includes logic to reassign a physical register of the second instruction to another logical register before retirement of the first instruction.
Abstract:
In one embodiment, a processor includes a performance monitor including a last branch record (LBR) stack to store a call stack to an event of interest, where the call stack is collected responsive to a trigger for the event. The processor further includes logic to control the LBR stack to operate in a call stack mode such that an entry to a call instruction for a leaf function is cleared on return from the leaf function. Other embodiments are described and claimed.
Abstract:
A processor includes an execution unit to execute instructions, where each operand of each executed instruction has one or more elements of an element size and at least one operand of the instruction corresponds to a register of a register size. The processor further includes a counter configured to count a number of instructions that have been executed by the execution unit associated with a particular combination of register size and element size.
Abstract:
A processor, system, and method are described for continued retirement of operations during a commit of a speculative region of program code. For example, one embodiment of a method comprises the operations of identifying a plurality of transactional memory regions in program code, including a first transactional memory region; and retiring one or more of a plurality of operations which follow the first transactional memory region even when a commit operation associated with the first transactional memory region is waiting to complete.
Abstract:
Techniques are described for enabling and/or disabling a secondary jump execution unit (JEU) in a micro-processor. The secondary JEU is incorporated in the micro-processor to operate concurrently with a primary JEU, and to enable the handling of simultaneous branch mispredicts on multiple branches. Activation and deactivation of the secondary JEU may be controlled by a pressure counter or a confidence counter. A pressure counter mechanism increments a count for each branch operation executed within the processor and decrements the count by a decay value during each cycle. A confidence counter mechanism increments a count for each correctly predicted branch, and decrements the count for each mispredict. Each counter signals an activation component, such as a port binding hardware component, to begin binding micro-operations to the secondary JEU when the counter exceeds an activation threshold. The counter mechanism may be thread-agnostic or thread-specific.