Dynamic voltage and frequency scaling based on memory channel slack

    公开(公告)号:US11119665B2

    公开(公告)日:2021-09-14

    申请号:US16212388

    申请日:2018-12-06

    Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.

    Fine-grained speed binning in an accelerated processing device

    公开(公告)号:US11061429B2

    公开(公告)日:2021-07-13

    申请号:US15795214

    申请日:2017-10-26

    Abstract: A technique for fine-granularity speed binning for a processing device is provided. The processing device includes a plurality of clock domains, each of which may be clocked with independent clock signals. The clock frequency at which a particular clock domain may operate is determined based on the longest propagation delay between clocked elements in that particular clock domain. The processing device includes measurement circuits for each clock domain that measure such propagation delay. The measurement circuits are replica propagation delay paths of actual circuit elements within each particular clock domain. A speed bin for each clock domain is determined based on the propagation delay measured for the measurement circuits for a particular clock domain. Specifically, a speed bin is chosen that is associated with the fastest clock speed whose clock period is longer than the slowest propagation delay measured for the measurement circuit for the clock domain.

    Routing flits in a network-on-chip based on operating states of routers

    公开(公告)号:US10944693B2

    公开(公告)日:2021-03-09

    申请号:US16188900

    申请日:2018-11-13

    Abstract: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.

    Reliable voltage scaled links for compressed data

    公开(公告)号:US10558606B1

    公开(公告)日:2020-02-11

    申请号:US16118172

    申请日:2018-08-30

    Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.

    SYSTEM AND METHOD FOR ENERGY REDUCTION BASED ON HISTORY OF RELIABILITY OF A SYSTEM

    公开(公告)号:US20180121312A1

    公开(公告)日:2018-05-03

    申请号:US15338172

    申请日:2016-10-28

    CPC classification number: G06F11/008 G06F11/3058 G06F11/3452 Y02D10/34

    Abstract: A system and method for managing operating parameters within a system for optimal power and reliability are described. A device includes a functional unit and a corresponding reliability evaluator. The functional unit provides reliability information to one or more reliability monitors, which translate the information to reliability values. The reliability evaluator determines an overall reliability level for the system based on the reliability values. The reliability monitor compares the actual usage values and the expected usage values. When system has maintained a relatively high level of reliability for a given time interval, the reliability evaluator sends an indication to update operating parameters to reduce reliability of the system, which also reduces power consumption for the system.

    Entropy agnostic data encoding and decoding

    公开(公告)号:US11362673B2

    公开(公告)日:2022-06-14

    申请号:US17089360

    申请日:2020-11-04

    Abstract: Entropy agnostic data encoding includes: receiving, by an encoder, input data including a bit string; generating a plurality of candidate codewords, including encoding the input data bit string with a plurality of binary vectors, wherein the plurality of binary vectors includes a set of deterministic biased binary vectors and a set of random binary vectors; selecting, in dependence upon a predefined criteria, one of the plurality of candidate codewords; and transmitting the selected candidate codeword to a decoder.

    Apparatus and method for providing workload distribution of threads among multiple compute units

    公开(公告)号:US11194634B2

    公开(公告)日:2021-12-07

    申请号:US16220827

    申请日:2018-12-14

    Abstract: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.

    Data compression system using base values and methods thereof

    公开(公告)号:US11144208B2

    公开(公告)日:2021-10-12

    申请号:US16724609

    申请日:2019-12-23

    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.

Patent Agency Ranking