Abstract:
A processing system includes at least one central processing unit (CPU) core, at least one graphics processing unit (GPU) core, a main memory, and a coherence directory for maintaining cache coherence. The at least one CPU core receives a CPU cache flush command to flush cache lines stored in cache memory of the at least one CPU core prior to launching a GPU kernel. The coherence directory transfers data associated with a memory access request by the at least one GPU core from the main memory without issuing coherence probes to caches of the at least one CPU core.
Abstract:
A data processing system includes a memory that includes a first memory bank and a second memory bank. The data processing system also includes a conflict detector connected to the memory and adapted to receive memory access information. The conflict detector tracks memory access statistics of the first memory bank, and determines if the first memory bank contains frequent row conflicts. The conflict detector also remaps a frequent row conflict in the first memory bank to the second memory bank. An indirection table is connected to the conflict detector and adapted to receive a memory access request, and redirects an address into a dynamically selected physical memory address in response to a remapping of the frequent row conflict to the second memory bank.
Abstract:
In some embodiments, a method of managing cache memory includes identifying a group of cache lines in a cache memory, based on a correlation between the cache lines. The method also includes tracking evictions of cache lines in the group from the cache memory and, in response to a determination that a criterion regarding eviction of cache lines in the group from the cache memory is satisfied, selecting one or more (e.g., all) remaining cache lines in the group for eviction.
Abstract:
A data processing system includes a memory that includes a first memory bank and a second memory bank. The data processing system also includes a conflict detector connected to the memory and adapted to receive memory access information. The conflict detector tracks memory access statistics of the first memory bank, and determines if the first memory bank contains frequent row conflicts. The conflict detector also remaps a frequent row conflict in the first memory bank to the second memory bank. An indirection table is connected to the conflict detector and adapted to receive a memory access request, and redirects an address into a dynamically selected physical memory address in response to a remapping of the frequent row conflict to the second memory bank.
Abstract:
A processor prunes state information based on information provided by software, thereby reducing the amount of state information to be stored prior to the processor entering a low-power state. The software, such as an operating system or application program executing at the processor, indicates one or more registers of the processor as storing data that is no longer useful. When preparing to enter the low-power state, the processor omits the indicated registers from the state information stored to memory.
Abstract:
A method of way prediction for a data cache having a plurality of ways is provided. Responsive to an instruction to access a stack data block, the method accesses identifying information associated with a plurality of most recently accessed ways of a data cache to determine whether the stack data block resides in one of the plurality of most recently accessed ways of the data cache, wherein the identifying information is accessed from a subset of an array of identifying information corresponding to the plurality of most recently accessed ways; and when the stack data block resides in one of the plurality of most recently accessed ways of the data cache, the method accesses the stack data block from the data cache.
Abstract:
A plurality of first controllers operate according to a plurality of access protocols to control a plurality of memory modules. A second controller receives access requests that target the plurality of memory modules and selectively provides the access requests and control information to the plurality of first controllers based on physical addresses in the access requests. The second controller generates the control information for the first controllers based on statistical representations of the access requests to the plurality of memory modules.
Abstract:
A cache controller to configure a portion of a first memory as cache for a second memory responsive to an indicator of locality of memory access requests to the second memory. The indicator of locality determines a probability that a location of a memory access request to the second memory is predictable based upon at least one previous memory access request. The cache controller may determine a size of the cache based on a value of the indicator of locality or modify the size of the cache in response to changes in the value of the indicator of locality.
Abstract:
A system and method for efficient predicting and processing of memory access dependencies. A computing system includes control logic that marks a detected load instruction as a first type responsive to predicting the load instruction has high locality and is a candidate for store-to-load (STL) data forwarding. The control logic marks the detected load instruction as a second type responsive to predicting the load instruction has low locality and is not a candidate for STL data forwarding. The control logic processes a load instruction marked as the first type as if the load instruction is dependent on an older store operation. The control logic processes a load instruction marked as the second type as if the load instruction is independent on any older store operation.
Abstract:
A cooling system controller for a set of computing resources of a data center includes a first interface to couple to a first flow controller that controls a rate of thermal energy transfer to a PCM store from the set of computing resources, a second interface to couple to a second flow controller that controls a rate of thermal energy transfer from the PCM store to a cooling system, and a controller to determine a current set of operational parameters for the data center and to manipulate the first and second flow controllers and via the first and second interfaces to control a net thermal energy transfer to and from the PCM store based on the current set of parameters.