Abstract:
A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.
Abstract:
Embodiments of apparatus, methods, systems, and devices are described herein for selective error correction in memory with multiple operation modes. In various embodiments, an error correction block (e.g., of a memory controller) may be configured to perform error correction on data read from a first portion of a memory based on a corresponding error correction code read from a second portion of the memory, and to calculate and store the error correction code. A control block coupled to the error correction block may be configured to selectively enable/disable the error correction block to perform the error correction, and to calculate and store the error correction code, based at least in part on a current operation mode of the memory.
Abstract:
A two-level main memory in which both volatile memory and persistent memory are exposed to the operating system in a flat manner and data movement and management is performed in cache line granularity is provided. The operating system can allocate pages in the two-level main memory randomly across the first level main memory and the second level main memory in a memory-type agnostic manner, or, in a more intelligent manner by allocating predicted hot pages in first level main memory and predicted cold pages in second level main memory. The cache line granularity movement is performed in a “swap” manner, that is, a hot cache line in the second level main memory is swapped with a cold cache line in first level main memory because data is stored in either first level main memory or second level main memory not in both first level main memory and second level main memory.
Abstract:
A method is described that includes creating a first data pattern access record for a region of system memory in response to a cache miss at a host side cache for a first memory access request. The first memory access request specifies an address within the region of system memory. The method includes fetching a previously existing data access pattern record for the region from the system memory in response to the cache miss. The previously existing data access pattern record identifies blocks of data within the region that have been previously accessed. The method includes pre-fetching the blocks from the system memory and storing the blocks in the cache.
Abstract:
One embodiment provides an apparatus. The apparatus includes last level cache circuitry and cache management circuitry. The last level cache circuitry stores cache blocks that at least partially include a subset of cache blocks stored by near memory circuitry. The near memory circuitry is configured in an n-way set associative format that references the cache blocks stored by the near memory circuitry using set identifiers and way identifiers. The cache management circuitry stores way identifiers for the cache blocks of the near memory circuitry within the cache blocks in the last level cache circuitry. Storing way identifiers in the cache blocks of the last level cache enables the cache management circuitry or memory controller circuitry to write back a cache block without reading tags in one or more ways of the near memory circuitry.
Abstract:
A device is configured to be in communication with one or more host cores via a first communication path. A first set of processing-in-memory (PIM) cores and a second set of PIM cores are configured to be in communication with a memory included in the device over a second communication path, wherein the first set of PIM cores have greater processing power than the second set of PIM cores, and wherein the second communication path has a greater bandwidth for data transfer than the first communication path. Code offloaded by the one or more host cores are executed in the first set of PIM cores and the second set of PIM cores.
Abstract:
Systems, apparatuses and methods may provide for technology to maintain a prediction table that tracks missed page addresses with respect to a first memory. If an access request does not correspond to any valid page addresses in the prediction table, the access request may be sent to the first memory. If the access request corresponds to a valid page address in the prediction table, the access request may be sent to the first memory and a second memory in parallel, wherein the first memory is associated with a shorter access time than the second memory.
Abstract:
An apparatus and method for implementing a heterogeneous memory subsystem is described. For example, one embodiment of a processor comprises: memory mapping logic to subdivide a system memory space into a plurality of memory chunks and to map the memory chunks across a first memory and a second memory, the first memory having a first set of memory access characteristics and the second memory having a second set of memory access characteristics different from the first set of memory access characteristics; and dynamic remapping logic to swap memory chunks between the first and second memories based, at least in part, on a detected frequency with which the memory chunks are accessed.
Abstract:
A multi-level memory management circuit can remap data between near and far memory. In one embodiment, a register array stores near memory addresses and far memory addresses mapped to the near memory addresses. The number of entries in the register array is less than the number of pages in near memory. Remapping logic determines that a far memory address of the requested data is absent from the register array and selects an available near memory address from the register array. Remapping logic also initiates writing of the requested data at the far memory address to the selected near memory address. Remapping logic further writes the far memory address to an entry of the register array corresponding to the selected near memory address.
Abstract:
Systems and methods for efficiently utilizing reconfigurable processor cores. An example processing system includes, for example, a control register comprising a plurality of inhibit bits, each inhibit bit indicating whether a corresponding processor core is allowed to merge with other processor cores; and dynamic core reallocation logic to temporarily merge a first processor core and a second processor core to speed execution of a first thread executed on the first processor core responsive to determining that a second thread executed on the second processor core has completed execution prior to a quantum associated with the second thread being reached and to determining that the inhibit bits indicate that the first and second cores may be merged.