Abstract:
A system and method for efficient branch prediction. A processor includes a next fetch predictor to generate a fast branch prediction for branch instructions at an early pipeline stage. The processor also includes a main return address stack (RAS) at a later pipeline stage for predicting the target of return instructions. When a return instruction is encountered, the prediction from the next fetch predictor is replaced by the top of the main RAS. If there are any recent call or return instructions in flight toward the main RAS, then a separate prediction is generated by a mini-RAS.
Abstract:
An apparatus includes an execute circuit configured to execute a plurality of operations received from a queue, as well as a power estimator circuit, and a power sensing circuit. The power estimator circuit is configured to predict power consumption due to execution of a particular operation of the plurality of operations, and to withdraw, based on the predicted power consumption, a first amount of power credits from a power credit pool. The power sensing circuit is configured to monitor one or more characteristics of a power supply node coupled to the execute circuit to generate a power value, and to deposit a second amount of power credits into the power credit pool. The second amount of power credits may be based on the power value indicating that power consumed during the execution of the particular operation is less than the predicted power consumption.
Abstract:
An IC in which a power state of a circuit in one power domain is managed based at least in part on a power state of a circuit in another power domain is disclosed. In one embodiment, an IC includes first and second functional circuit blocks in first and second power domains, respectively. A third functional block shared by the first and second is also implemented in the first power domain. A power management unit may control power states of each of the first, second, and third functional circuit blocks. The power management circuit may, when the first functional circuit block is in a sleep state, set a power state of the third functional block in accordance with that of the second functional circuit block.
Abstract:
An IC in which a power state of a circuit in one power domain is managed based at least in part on a power state of a circuit in another power domain is disclosed. In one embodiment, an IC includes first and second functional circuit blocks in first and second power domains, respectively. A third functional block shared by the first and second is also implemented in the first power domain. A power management unit may control power states of each of the first, second, and third functional circuit blocks. The power management circuit may, when the first functional circuit block is in a sleep state, set a power state of the third functional block in accordance with that of the second functional circuit block.
Abstract:
A method and apparatus for reducing capacitor noise in electronic systems is disclosed. A system includes at least one functional circuit block coupled to receive a variable supply voltage. The value of the supply voltage is controlled by a power management circuit. Changing a performance state of the functional circuit block includes increasing the supply voltage for higher performance, and reducing the supply voltage for reduced performance demands. The power management circuit, in changing to a higher performance state, increases the supply voltage at a first rate. A rate control circuit causes the power management circuit to reduce the supply voltage, when changing to a lower performance state, at a second rate that is less than the first rate.
Abstract:
A system and method for efficient branch prediction. A processor includes a next fetch predictor to generate a fast branch prediction for branch instructions at an early pipeline stage. The processor also includes a main return address stack (RAS) at a later pipeline stage for predicting the target of return instructions. When a return instruction is encountered, the prediction from the next fetch predictor is replaced by the top of the main RAS. If there are any recent call or return instructions in flight toward the main RAS, then a separate prediction is generated by a mini-RAS.
Abstract:
Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.
Abstract:
In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.
Abstract:
A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.
Abstract:
A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.