摘要:
Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
摘要:
A system and method for efficiently reducing the power consumption of register file accesses. A processor is operable to execute instructions with two or more data types, each with an associated size and alignment. Data operands for a first data type use operand sizes equal to an entire width of a physical register within a physical register file. Data operands for a second data type use operand sizes less than an entire width of a physical register. Accesses of the physical register file for operands associated with a non-full-width data type do not access a full width of the physical registers. A given numerical value may be bypassed for the portion of the physical register that is not accessed.
摘要:
A system and method for reducing the latency of data move operations. A register rename unit within a processor determines whether a decoded move instruction is eligible for a zero cycle move operation. If so, control logic assigns a physical register identifier associated with a source operand of the move instruction to the destination operand of the move instruction. Additionally, the register rename unit marks the given move instruction to prevent it from proceeding in the processor pipeline. Further maintenance of the particular physical register identifier may be done by the register rename unit during commit of the given move instruction.
摘要:
In one embodiment, an integrated circuit comprises a first processor configured to output program counter (PC) trace records, wherein PC trace records provide data indicating the PCs of instructions retired by the first processor. The integrated circuit further comprises a second source of trace records, and a trace unit coupled to receive the PC trace records from the first processor and the trace records from the second source. The trace unit comprises a trace memory into which the trace unit is configured to store the PC trace records and trace records from the second source. The trace unit is configured to interleave the PC trace records and the trace records from the second source in the trace memory according to the order of receipt of the records.
摘要:
A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
摘要:
A system and method for dithering a clock signal during idle times is disclosed. An integrated circuit (IC) includes a number of functional units and a clock tree. The clock tree includes a root level clock-gating circuit, a number of regional clock-gating circuits, and a number of leaf level clock-gating circuits. The root level clock-gating circuit is coupled to distribute an operating clock signal to the regional clock-gating circuits, while the regional clock-gating circuits are each configured to distribute the operating clock signal to correspondingly coupled ones of the leaf level clock-gating circuits. The IC may further include a control unit configured to monitor activity levels and indications from each of the functional units. The control unit may cause the root clock-gating circuit to dither the clock signal if the IC is idle, wherein dithering includes reducing the duty cycle and the effective frequency of the operating clock signal.
摘要:
In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.
摘要:
In one embodiment, an integrated circuit comprises a first processor configured to output program counter (PC) trace records, wherein PC trace records provide data indicating the PCs of instructions retired by the first processor. The integrated circuit further comprises a second source of trace records, and a trace unit coupled to receive the PC trace records from the first processor and the trace records from the second source. The trace unit comprises a trace memory into which the trace unit is configured to store the PC trace records and trace records from the second source. The trace unit is configured to interleave the PC trace records and the trace records from the second source in the trace memory according to the order of receipt of the records.
摘要:
A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
摘要:
A system and method for reducing latency and power of register renaming. A free list in processor includes multiple banks for indicating availability of register identifiers used for register renaming. A register rename unit receives one or more destination architectural registers to rename with physical register identifiers. Responsive to determining the multiple banks within the free list are unbalanced with available physical register identifiers, one or more returning physical register identifiers are assigned to the destination architectural registers before assigning any physical register identifiers from any bank of the multiple banks with a lowest number of available physical register identifiers. A returning physical register identifier is a physical register identifier that is available again for assignment to a destination architectural register but not yet indicated in the free list as available. Each of the banks includes a single bit width decoded vector for indicating availability of given physical register identifiers.