摘要:
In an embodiment, a processor includes multiple tiles, each including a core and a tile cache hierarchy. This tile cache hierarchy includes a first level cache, a mid-level cache (MLC) and a last level cache (LLC), and each of these caches is private to the tile. A controller coupled to the tiles includes a cache power control logic to receive utilization information regarding the core and the tile cache hierarchy of a tile and to cause the LLC of the tile to be independently power gated, based at least in part on this information. Other embodiments are described and claimed.
摘要:
In an embodiment, a processor includes multiple tiles, each including a core and a tile cache hierarchy. This tile cache hierarchy includes a first level cache, a mid-level cache (MLC) and a last level cache (LLC), and each of these caches is private to the tile. A controller coupled to the tiles includes a cache power control logic to receive utilization information regarding the core and the tile cache hierarchy of a tile and to cause the LLC of the tile to be independently power gated, based at least in part on this information. Other embodiments are described and claimed.
摘要:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
摘要:
Systems and methods for managing shared cache by multi-core processor. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a last level cache (LLC) slice; and a cache control logic coupled to the plurality of processing cores, the cache control logic configured to perform one of: making an LLC slice of an inactive processing core available to an active processing core or power gating the LLC slice, based on estimating cache requirements by active processing cores.
摘要:
A method, device, and system are disclosed. In one embodiment the method includes scheduling a thread to run on first core of a multi-core processor. The determination as to which core the thread is scheduled on uses one or more processes. These processes may include ranking all of the cores specific to a workload of the thread, establishing a current utilization of each core of the multi-core processor, and calculating an inter-core migration cost for the thread.
摘要:
In an embodiment, a processor includes at least one core having one or more execution units, a first cache memory and a first cache control logic. The first cache control logic may be configured to generate a first prefetch request to prefetch first data, where this request is to be aborted if the first data is not present in a second cache memory coupled to the first cache memory. Other embodiments are described and claimed.
摘要:
Provided are a memory system, device, and method for determining to send a refresh command to a memory module according to a refresh rate and incrementing a postponed refresh count while the memory module is in an active mode in response to the determining to send the refresh command. The refresh command is not sent to the memory module when the postponed refresh count is incremented. A determination is made as to whether the postponed refresh count exceeds a count threshold. A refresh command is issued to the memory module to perform refresh in an active mode in response to determining that the postponed refresh count exceeds the count threshold.
摘要:
Embodiments include a method and system of dynamically allocatable memory error mitigation. In one embodiment, a system applies an error mitigation mechanism to one of multiple groups of memory units, wherein the one group is in active use during an error test of a second group of memory units. The system deactivates and tests the second group of memory units for errors. In response to detecting an error in a memory unit of the second group, the system applies, to the memory unit of the second group having the error, the error mitigation mechanism for active use. The system then activates the second group of memory units with the error mitigation mechanism applied to the memory unit of the second group having the error.
摘要:
Embodiments of an apparatus to reduce memory power consumption are presented. In one embodiment, the apparatus comprises a cache memory, a memory, and a control unit. In one embodiment, the memory includes a plurality of memory ranks. The control unit is operable to select one or more memory ranks among the plurality of memory ranks to be idle-prioritized memory ranks such that access frequency to the idle-prioritized memory ranks is reduced.
摘要:
A processor includes a cache, a prefetcher module to select information according to a prefetcher algorithm, and a prefetcher algorithm selection module. The prefetcher algorithm selection module includes logic to select a candidate prefetcher algorithm determine and store memory addresses of predicted memory accesses of the candidate prefetcher algorithm when performed by the prefetcher module, determine cache lines accessed during memory operations, and evaluate whether the determined cache lines match the stored memory addresses. The prefetcher algorithm selection module further includes logic to adjust an accuracy ratio of the candidate prefetcher algorithm, compare the accuracy ratio with a threshold accuracy ratio, and determine whether to apply the first candidate prefetcher algorithm to the prefetcher module.