摘要:
A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
摘要:
In one embodiment, an integrated circuit comprises a first processor configured to output program counter (PC) trace records, wherein PC trace records provide data indicating the PCs of instructions retired by the first processor. The integrated circuit further comprises a second source of trace records, and a trace unit coupled to receive the PC trace records from the first processor and the trace records from the second source. The trace unit comprises a trace memory into which the trace unit is configured to store the PC trace records and trace records from the second source. The trace unit is configured to interleave the PC trace records and the trace records from the second source in the trace memory according to the order of receipt of the records.
摘要:
A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
摘要:
A system and method for dithering a clock signal during idle times is disclosed. An integrated circuit (IC) includes a number of functional units and a clock tree. The clock tree includes a root level clock-gating circuit, a number of regional clock-gating circuits, and a number of leaf level clock-gating circuits. The root level clock-gating circuit is coupled to distribute an operating clock signal to the regional clock-gating circuits, while the regional clock-gating circuits are each configured to distribute the operating clock signal to correspondingly coupled ones of the leaf level clock-gating circuits. The IC may further include a control unit configured to monitor activity levels and indications from each of the functional units. The control unit may cause the root clock-gating circuit to dither the clock signal if the IC is idle, wherein dithering includes reducing the duty cycle and the effective frequency of the operating clock signal.
摘要:
In one embodiment, an integrated circuit comprises a first processor configured to output program counter (PC) trace records, wherein PC trace records provide data indicating the PCs of instructions retired by the first processor. The integrated circuit further comprises a second source of trace records, and a trace unit coupled to receive the PC trace records from the first processor and the trace records from the second source. The trace unit comprises a trace memory into which the trace unit is configured to store the PC trace records and trace records from the second source. The trace unit is configured to interleave the PC trace records and the trace records from the second source in the trace memory according to the order of receipt of the records.
摘要:
A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
摘要:
A system and method for reducing latency and power of register renaming. A free list in processor includes multiple banks for indicating availability of register identifiers used for register renaming. A register rename unit receives one or more destination architectural registers to rename with physical register identifiers. Responsive to determining the multiple banks within the free list are unbalanced with available physical register identifiers, one or more returning physical register identifiers are assigned to the destination architectural registers before assigning any physical register identifiers from any bank of the multiple banks with a lowest number of available physical register identifiers. A returning physical register identifier is a physical register identifier that is available again for assignment to a destination architectural register but not yet indicated in the free list as available. Each of the banks includes a single bit width decoded vector for indicating availability of given physical register identifiers.
摘要:
A multi-threaded processor provides for efficient flow-control from a pool of un-executed stores in an instruction queue to a store queue. The processor also includes similar capabilities with respect to load instructions. The processor includes logic organized into a plurality of thread processing units (“TPUs”) and allocation logic that monitors each TPUs demand for entries in the store queue. Demand is determined by subtracting an adjustable threshold value from the most recently assigned store identifier value. If the difference between the most recently assigned instruction identifier for a TPU and the TPU's threshold is non-zero, then it is determined that the TPU has demand for at least one entry in the store queue. The allocation logic includes arbitration logic that determines which one of a plurality of TPUs with store queue demand should be allocated a free entry in the store queue.
摘要:
A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
摘要:
A system and method for efficiently reducing the latency of initializing registers. A register rename unit within a processor determines whether prior to an execution pipeline stage it is known a decoded given instruction writes a particular numerical value in a destination operand. An example is a move immediate instruction that writes a value of 0 in its destination operand. Other examples may also qualify. If the determination is made, a given physical register identifier is assigned to the destination operand, wherein the given physical register identifier is associated with the particular numerical value, but it is not associated with an actual physical register in a physical register file. The given instruction is marked to prevent it from proceeding to an execution pipeline stage. When the given physical register identifier is used to read the physical register file, no actual physical register is accessed.