Abstract:
Embodiments of the invention describe an apparatus, system and method for utilizing a utility and lifetime based cached replacement policy as described herein. For processors having one or more processor cores and a cache memory accessible via the processor core(s), embodiments of the invention describe a cache controller to determine, for a plurality of cache blocks in the cache memory, an estimated utility and lifetime of the contents of each cache block, the utility of a cache block to indicate a likelihood of use its contents, the lifetime of a cache block to indicate a duration of use of its contents. Upon receiving a cache access request resulting in a cache miss, said cache controller may select one of the cache blocks to be replaced based, at least in part, on one of the estimated utility or estimated lifetime of the cache block.
Abstract:
Embodiments of the invention describe an apparatus, system and method for workload adaptive address mapping. Embodiments of the invention may receive a request to initialize a system memory including a plurality of memory banks. Using a plurality of memory address mapping schemes for memory settings for the system memory, a system characterization workload is executed during the initialization of the system memory, the system characterization workload including a plurality of transactions directed towards the system memory. Embodiments of the invention may monitor target addresses of the plurality of transactions directed towards the system memory. One of the plurality of memory address mapping schemes is selected based, at least in part, on the target addresses of the plurality of transactions.
Abstract:
Embodiments of the invention describe an apparatus, system and method for utilizing a page miss handler having wear leveling logic/modules for memory devices. Embodiments of the invention may track an amount of writes directed towards cells of a memory device, and determine whether a linear address specified by a system write transaction is included in a translation-lookaside buffer (TLB). In response to determining the linear address is not included in the TLB, resulting in a TLB miss, embodiments of the invention may perform a page table walk to obtain a corresponding physical address, and convert the physical address to a device address for accessing the memory device based the tracked amount of writes. Thus, embodiments of the invention are more efficient compared to prior art solutions, as instead of all memory operations, only those that miss in the TLB incur additional wear leveling address translation overhead.
Abstract:
A method to request memory from a far memory cache and implement, at an operating system (OS) level, a fully associative cache on the requested memory. The method includes pinning the working set of a program into the requested memory (pin buffer) so that it is not evicted due to cache conflicts and is served from the fast cache and not the slower next level memory. The requested memory extends the physical address space and is visible to and managed by the OS. The OS has the ability to make the requested memory visible to the user programs. The OS has the ability to manage the requested memory from the far memory cache as both a fully associative cache and a set associative cache.
Abstract:
Techniques and mechanisms for adaptively changing between replacement policies for selecting lines of a cache for eviction. In an embodiment, evaluation logic determines a value of a performance metric which is for writes to a non-volatile memory. Based on the determined value of the performance metric, a parameter value of a replacement policy is determined. In another embodiment, cache replacement logic performs a selection of a line of cache for data eviction, where the selection is in response to the policy unit providing an indication of the determined parameter value.
Abstract:
Techniques and mechanism to provide a cache of cache tags in determining an access to cached data. In an embodiment, a tag storage stores a first set including tags associated with respective data locations of a cache memory. A cache of cache tags stores a subset of tags stored by the tag storage. Where a tag of the first set is to be stored to the cache of cache tags, all tags of the first set are stored to the first portion. In another embodiment, any storage of tags of the first set to the cache of cache tags includes storage of the tags of the first set to only a first portion of the cache of cache tags. A replacement table is maintained for use in evicting or replacing cached tags based on an indicated level of activity for a set of the cache of cache tags.
Abstract:
An apparatus and method for implementing non-volatile store (nvstore) and non-volatile flush (nvflush) instructions. For example, a method according to one embodiment comprises: executing a set of non-volatile store instructions indicating data to be persisted to a non-volatile memory (NVM) of a multi-level system memory hierarchy; generating an entry in an NVM store queue prior to storing the data to the NVM, each entry indicating that the data associated therewith has not yet been persisted to non-volatile memory; executing a non-volatile flush instruction at a time when the data associated with each entry in the non-volatile store queue should be persisted to non-volatile memory; and removing the entries from the NVM store queue as the data associated with each entry is written to non-volatile memory.
Abstract:
Embodiments of the invention describe an apparatus, system and method for sub-block based wear leveling for memory devices. Embodiments of the invention may receive a write request to a physical memory address including a physical block address and a physical sub-block address. An address remapping table is accessed to translate the physical block address to a memory device block address to locate a plurality of memory device sub-blocks. A plurality of sub-block activity counters are accessed, each sub-block activity counter associated with one of the memory device sub-blocks. One of the plurality of memory device sub-blocks is selected to store write data of the write request based, at least in part, on values of the plurality of sub-block activity counters, and the value of the sub-block activity counter associated with the selected memory device sub-block is updated.