摘要:
Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
摘要:
A processor includes an execution unit to execute instructions and a scheduler unit to store a queue of instructions for execution by the execution unit. The scheduler unit includes a wake array including a plurality of source slots to store source identifiers for sources associated with the instructions, a picker to schedule a particular instruction for execution in the execution unit, broadcast a destination identifier associated with the particular instruction to a first subset of the source slots, and a delay element to receive the destination identifier broadcast by the picker and communicate a delayed version of the destination identifier to a second subset of the source slots different from the first subset.
摘要:
An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
摘要:
A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.
摘要:
An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.
摘要:
Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
摘要:
The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.
摘要:
Described herein are methods and processors for flag renaming in groups to eliminate dependencies of instructions. Decoder and execution units in the processor may be configured to rename flags into groups that allow each group to be treated separately as appropriate. This flag renaming eliminates flag dependencies with respect to instructions. This allows an instruction to write exactly the flags that the instruction wants without having to create merge dependencies. Methods and processors are provided for handling immediate values embedded in instructions. A 16 bit immediate bus and a 4 bit encoding/control bus are added at the interface between decode and execution units. For an 8 or 12 bit immediate, the upper 4 bits of the immediate bus contain the encoding bits. For a 16 bit immediate, the encoding/control bus contains the encoding bits. The encoding/control bus indicates when to look at the top four bits of the immediate bus.
摘要:
Methods, devices, and systems for accessing packed registers are presented. A state of the packed registers may be tracked and it may be determined whether the register is directly accessible based on the state. If the register is not directly accessible, an action may be performed which allows the register to be accessed directly. The action may include injecting at least one uop for reorganizing the physical storage of the register such that it is directly accessible. The action may include aligning the data with the least significant bit of a physical register or otherwise aligning the data with the datapath. The action may also include changing the state of the packed registers.
摘要:
An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located.