Compressed cache memory with decompress on fault

    公开(公告)号:US12130738B2

    公开(公告)日:2024-10-29

    申请号:US17130632

    申请日:2020-12-22

    CPC classification number: G06F12/0802 H03M7/60 G06F2212/401 G06F2212/60

    Abstract: An embodiment of an integrated circuit may comprise, coupled to a core, a hardware decompression accelerator, a compressed cache, a processor and communicatively coupled to the hardware decompression accelerator and the compressed cache, and memory and communicatively coupled to the processor, wherein the memory stores microcode instructions which when executed by the processor causes the processor to store a first address to a decompression work descriptor, retrieve a second address where a compressed page is stored in the compressed cache from the decompression work descriptor at the first address in response to an indication of a page fault, and send instructions to the hardware decompression accelerator to decompress the compressed page at the second address. Other embodiments are disclosed and claimed.

    APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE

    公开(公告)号:US20220197659A1

    公开(公告)日:2022-06-23

    申请号:US17133622

    申请日:2020-12-23

    Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.

    Technology for dynamically tuning processor features

    公开(公告)号:US11256599B2

    公开(公告)日:2022-02-22

    申请号:US17128291

    申请日:2020-12-21

    Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.

    Memory-efficient last level cache architecture

    公开(公告)号:US10776270B2

    公开(公告)日:2020-09-15

    申请号:US16222788

    申请日:2018-12-17

    Abstract: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.

Patent Agency Ranking