RESOURCE-AWARE COMPRESSION
    31.
    发明申请

    公开(公告)号:US20210191869A1

    公开(公告)日:2021-06-24

    申请号:US16725971

    申请日:2019-12-23

    Abstract: Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.

    CONTROLLING THE OPERATING SPEED OF STAGES OF AN ASYNCHRONOUS PIPELINE

    公开(公告)号:US20210089324A1

    公开(公告)日:2021-03-25

    申请号:US16913146

    申请日:2020-06-26

    Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.

    BYTE SELECT CACHE COMPRESSION
    33.
    发明申请

    公开(公告)号:US20200133866A1

    公开(公告)日:2020-04-30

    申请号:US16176828

    申请日:2018-10-31

    Abstract: The disclosure herein provides techniques for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.

    RELIABLE VOLTAGE SCALED LINKS FOR COMPRESSED DATA

    公开(公告)号:US20200073845A1

    公开(公告)日:2020-03-05

    申请号:US16118172

    申请日:2018-08-30

    Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.

    FINE-GRAINED SPEED BINNING IN AN ACCELERATED PROCESSING DEVICE

    公开(公告)号:US20190129463A1

    公开(公告)日:2019-05-02

    申请号:US15795214

    申请日:2017-10-26

    Abstract: A technique for fine-granularity speed binning for a processing device is provided. The processing device includes a plurality of clock domains, each of which may be clocked with independent clock signals. The clock frequency at which a particular clock domain may operate is determined based on the longest propagation delay between clocked elements in that particular clock domain. The processing device includes measurement circuits for each clock domain that measure such propagation delay. The measurement circuits are replica propagation delay paths of actual circuit elements within each particular clock domain. A speed bin for each clock domain is determined based on the propagation delay measured for the measurement circuits for a particular clock domain. Specifically, a speed bin is chosen that is associated with the fastest clock speed whose clock period is longer than the slowest propagation delay measured for the measurement circuit for the clock domain.

Patent Agency Ranking