Abstract:
A method includes assigning a thread performance counter to threads being created in the computing environment, the thread performance counter measuring a number of cache misses for a corresponding thread. The method also includes calculating a self-thread value S as a change in the thread performance counter of a given thread during a predetermined period, calculating an other-thread value O as a sum of changes in all the thread performance counters during the predetermined period minus S, and calculating an estimation adjustment value associated with a first probability that a second set of cache misses for the corresponding thread replace a cache area currently occupied by the corresponding thread. The method also includes estimating a cache occupancy for the thread based on a previous occupancy for the thread, S, O, and the estimation adjustment value, and assigning computing environment resources to the thread based on the estimated cache occupancy.
Abstract:
A method is described for scheduling in an intelligent manner a plurality of threads on a processor having a plurality of cores and a shared last level cache (LLC). In the method, a first and second scenario having a corresponding first and second combination of threads are identified. The cache occupancies of each of the threads for each of the scenarios are predicted. The predicted cache occupancies being a representation of an amount of the LLC that each of the threads would occupy when running with the other threads on the processor according to the particular scenario. One of the scenarios is identified that results in the least objectionable impacts on all threads, the least objectionable impacts taking into account the impact resulting from the predicted cache occupancies. Finally, a scheduling decision is made according to the one of the scenarios that results in the least objectionable impacts.
Abstract:
In a system having non-uniform memory access architecture, with a plurality of nodes, memory access by entities such as virtual CPUs is estimated by invalidating a selected sub-set of memory units, and then detecting and compiling access statistics, for example by counting the page faults that arise when any virtual CPU accesses an invalidated memory unit. The entities, or pairs of entities, may then be migrated or otherwise co-located on the node for which they have greatest memory locality.
Abstract:
In a system having non-uniform memory access architecture, with a plurality of nodes, memory access by entities such as virtual CPUs is estimated by invalidating a selected sub-set of memory units, and then detecting and compiling access statistics, for example by counting the page faults that arise when any virtual CPU accesses an invalidated memory unit. The entities, or pairs of entities, may then be migrated or otherwise co-located on the node for which they have greatest memory locality.
Abstract:
Miss rate curves are constructed in a resource-efficient manner so that they can be constructed and memory management decisions can be made while the workloads are running. The resource-efficient technique includes the steps of selecting a subset of memory pages for the workload, maintaining a least recently used (LRU) data structure for the selected memory pages, detecting accesses to the selected memory pages and updating the LRU data structure in response to the detected accesses, and generating data for constructing a miss-rate curve for the workload using the LRU data structure. After a memory page is accessed, the memory page may be left untraced for a period of time, after which the memory page is retraced.