Abstract:
Systems, apparatuses, and methods for implementing an optimized adaptive thermal control mechanism for an integrated circuit (IC) are described. A control unit receives a digital input value which is representative of a temperature of an IC. The control unit compares the input value to at least two set points. A result of a first comparison determines whether an accumulator is incremented or decremented by a programmable gain value. A result of a second comparison determines whether the accumulator is primed with a preset ramp-up value. The preset ramp-up value is used since the accumulator can take several sensing cycles to reach the optimal control value while thermal gradients can become critical in only a few cycles. The output of the accumulator is provided to an actuator which adjusts parameter(s) to modulate the IC's temperature. The granularity and range of the accumulator matches the granularity and range of the actuator.
Abstract:
A scalable cache coherency protocol for system including a plurality of coherent agents coupled to one or more memory controllers is described. The memory controller may implement a precise directory for cache blocks from the memory to which the memory controller is coupled. Multiple requests to a cache block may be outstanding, and snoops and completions for requests may include an expected cache state at the receiving agent, as indicated by a directory in the memory controller when the request was processed, to allow the receiving agent to detect race conditions. In an embodiment, the cache states may include a primary shared and a secondary shared state. The primary shared state may apply to a coherent agent that bears responsibility for transmitting a copy of the cache block to a requesting agent. In an embodiment, at least two types of snoops may be supported: snoop forward and snoop back.
Abstract:
Various techniques for processing and pre-decoding branches within an IT instruction block. Instructions are fetched and cached in an instruction cache, and pre-decode bits are generated to indicate the presence of an IT instruction and the likely boundaries of the IT instruction block. If an unconditional branch is detected within the likely boundaries of an IT instruction block, the unconditional branch is treated as if it were a conditional branch. The unconditional branch is sent to the branch direction predictor and the predictor generates a branch direction prediction for the unconditional branch.
Abstract:
In an embodiment, a combining write buffer is configured to maintain one or more flush metrics to determine when to transmit write operations from buffer entries. The combining write buffer may be configured to dynamically modify the flush metrics in response to activity in the write buffer, modifying the conditions under which write operations are transmitted from the write buffer to the next lower level of memory. For example, in one implementation, the flush metrics may include categorizing write buffer entries as “collapsed.” A collapsed write buffer entry, and the collapsed write operations therein, may include at least one write operation that has overwritten data that was written by a previous write operation in the buffer entry. In another implementation, the combining write buffer may maintain the threshold of buffer fullness as a flush metric and may adjust it over time based on the actual buffer fullness.
Abstract:
A system including a plurality of processor cores, a plurality of graphics processing units, a plurality of peripheral circuits, and a plurality of memory controllers is configured to support scaling of the system using a unified memory architecture.
Abstract:
An apparatus includes an execute circuit configured to execute a plurality of operations received from a queue, as well as a power estimator circuit, and a power sensing circuit. The power estimator circuit is configured to predict power consumption due to execution of a particular operation of the plurality of operations, and to withdraw, based on the predicted power consumption, a first amount of power credits from a power credit pool. The power sensing circuit is configured to monitor one or more characteristics of a power supply node coupled to the execute circuit to generate a power value, and to deposit a second amount of power credits into the power credit pool. The second amount of power credits may be based on the power value indicating that power consumed during the execution of the particular operation is less than the predicted power consumption.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
Abstract:
A method and apparatus for reducing capacitor noise in electronic systems is disclosed. A system includes at least one functional circuit block coupled to receive a variable supply voltage. The value of the supply voltage is controlled by a power management circuit. Changing a performance state of the functional circuit block includes increasing the supply voltage for higher performance, and reducing the supply voltage for reduced performance demands. The power management circuit, in changing to a higher performance state, increases the supply voltage at a first rate. A rate control circuit causes the power management circuit to reduce the supply voltage, when changing to a lower performance state, at a second rate that is less than the first rate.
Abstract:
Systems, apparatuses, and methods for reaching power targets across different clock domains are described. In various embodiments, a first processor complex and a second processor complex operate while powered by a same, single power plane, but with respective clock domains. When a request is detected to change an operating mode of a particular core from one of the processor complexes to an operating mode which does not provide the worst-case power supply load on the single power plane, an amount of voltage margin to recover from the operational voltage is determined based on the second operating mode prior to granting the request and based on each other core in the complexes operating in respective current operating modes. An operational voltage less the determined voltage margin to recover is assigned to the processor complexes while different clock frequencies are assigned to the processor complexes.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.