摘要:
A secondary jump execution unit (JEU) is incorporated in a micro-processor to operate concurrently with a primary JEU, enabling the execution of simultaneous branch operations with possible detection of multiple branch mispredicts. When branch operations are executed on both JEUs in a same instruction cycle, mispredict processing for the secondary JEU is skidded into the primary JEU's dispatch pipeline such that the branch processing for the secondary JEU occurs after processing of the branch for the primary JEU and while the primary JEU is not processing a branch. Moreover, in cases when a nuke command is also received from a reorder buffer of the processor, the branch processing for the secondary JEU is further delayed to accommodate processing of the nuke on the primary JEU. Further embodiments support the promotion of the secondary JEU to have access to the mispredict mechanisms of the primary JEU in certain circumstances.
摘要:
A processor includes one or more execution units to execute instructions of a plurality of threads and thread control logic coupled to the execution units to predict whether a first of the plurality of threads is ready for selection in a current cycle based on readiness of instructions of the first thread in one or more previous cycles, to predict whether a second of the plurality of threads is ready for selection in the current cycle based on readiness of instructions of the second thread in the one or more previous cycles, and to select one of the first and second threads in the current cycle based on the predictions.
摘要:
Systems and methods for flag tracking in data manipulation operations involving move elimination. An example processing system comprises a first data structure including a plurality of physical register values; a second data structure including a plurality of pointers referencing elements of the first data structure; a third data structure including a plurality of move elimination sets, each move elimination set comprising two or more bits representing two or more logical data registers, the third data structure further comprising at least one bit associated with each move elimination set, the at least one bit representing one or more logical flag registers; a fourth data structure including an identifier of a data register sharing an element of the first data structure with a flag register; and a move elimination logic configured to perform a move elimination operation.
摘要:
A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.
摘要:
A scheduler implementing a dependency matrix having restricted entries is disclosed. A processing device of the disclosure includes a decode unit to decode an instruction and a scheduler communicably coupled to the decode unit. In one embodiment, the scheduler is configured to receive the decoded instruction, determine that the decoded instruction qualifies for allocation as a restricted reservation station (RS) entry type in a dependency matrix maintained by the scheduler, identify RS entries in the dependency matrix that are free for allocation, allocate one of the identified free RS entries with information of the decoded instruction in the dependency matrix, and update a row of the dependency matrix corresponding to the claimed RS entry with source dependency information of the decoded instruction.
摘要:
A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
摘要:
Techniques are described for enabling and/or disabling a secondary jump execution unit (JEU) in a micro-processor. The secondary JEU is incorporated in the micro-processor to operate concurrently with a primary JEU, and to enable the handling of simultaneous branch mispredicts on multiple branches. Activation and deactivation of the secondary JEU may be controlled by a pressure counter or a confidence counter. A pressure counter mechanism increments a count for each branch operation executed within the processor and decrements the count by a decay value during each cycle. A confidence counter mechanism increments a count for each correctly predicted branch, and decrements the count for each mispredict. Each counter signals an activation component, such as a port binding hardware component, to begin binding micro-operations to the secondary JEU when the counter exceeds an activation threshold. The counter mechanism may be thread-agnostic or thread-specific.
摘要:
A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register.
摘要:
An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.
摘要:
An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.