Abstract:
Hybrid multi-level memory architecture technologies are described. A System on Chip (SOC) includes multiple functional units and a multi-level memory controller (MLMC) coupled to the functional units. The MLMC is coupled to a hybrid multi-level memory architecture including a first-level dynamic random access memory (DRAM) (near memory) that is located on-package of the SOC and a second-level DRAM (far memory) that is located off-package of the SOC. The MLMC presents the first-level DRAM and the second-level DRAM as a contiguous addressable memory space and provides the first-level DRAM to software as additional memory capacity to a memory capacity of the second-level DRAM. The first-level DRAM does not store a copy of contents of the second-level DRAM.
Abstract:
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
Abstract:
Technologies are provided in embodiments to dynamically fill a shared cache. At least some embodiments include determining that data requested in a first request for the data by a first processing device is not stored in a cache shared by the first processing device and a second processing device, where a dynamic fill policy is applicable to the first request. Embodiments further include determining to deallocate, based at least in part on a threshold, an entry in a buffer, the entry containing information corresponding to the first request for the data. Embodiments also include sending a second request for the data to a system memory, and sending the data from the system memory to the first processing device. In more specific embodiments, the data from the system memory is not written to the cache based, at least in part, on the determination to deallocate the entry.
Abstract:
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
Abstract:
Technologies are provided in embodiments to dynamically fill a shared cache. At least some embodiments include determining that data requested in a first request for the data by a first processing device is not stored in a cache shared by the first processing device and a second processing device, where a dynamic fill policy is applicable to the first request. Embodiments further include determining to deallocate, based at least in part on a threshold, an entry in a buffer, the entry containing information corresponding to the first request for the data. Embodiments also include sending a second request for the data to a system memory, and sending the data from the system memory to the first processing device. In more specific embodiments, the data from the system memory is not written to the cache based, at least in part, on the determination to deallocate the entry.
Abstract:
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
Abstract:
An indication of a first cache entry to be removed from a cache, the first cache entry corresponding to a first memory address in a memory may be received. A second memory address in the memory based at least in part on the first memory address may be identified. A request to identify a second cache entry in the cache corresponding to the second memory address and to determine whether a removal policy is satisfied may be sent. A response to the request, the response comprising an indication of the second cache entry to be removed from the cache based at least in part on the request may be sent. The first cache entry and the second cache entry from the cache may be removed. Data corresponding to the first and second cache entries to the memory with a single page file access may be written.
Abstract:
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
Abstract:
An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
Abstract:
Techniques for managing multi-level memory and coherency using a unified page granular controller can simplify software programming of both file system handling for persistent memory and parallel programming of host and accelerator and enable better software utilization of host processors and accelerators. As part of the management techniques, a line granular controller cooperates with a page granular controller to support both fine grain and coarse grain coherency and maintain overall system inclusion property. In one example, a controller to manage coherency in a system includes a memory data structure and on-die tag cache to store state information to indicate locations of pages in a memory hierarchy and an ownership state for the pages, the ownership state indicating whether the pages are owned by a host processor, owned by an accelerator device, or shared by the host processor and the accelerator device. The controller can also include logic to, in response to a memory access request from the host processor or the accelerator to access a cacheline in a page in a state indicating ownership by a device other than the requesting device, cause the page to transition to a state in which the requesting device owns or shares the page.