摘要:
Miss rate curves are constructed in a resource-efficient manner so that they can be constructed and memory management decisions can be made while the workloads are running. The resource-efficient technique includes the steps of selecting a subset of memory pages for the workload, maintaining a least recently used (LRU) data structure for the selected memory pages, detecting accesses to the selected memory pages and updating the LRU data structure in response to the detected accesses, and generating data for constructing a miss-rate curve for the workload using the LRU data structure. After a memory page is accessed, the memory page may be left untraced for a period of time, after which the memory page is retraced.
摘要:
Methods, computer programs, and systems for managing thread performance in a computing environment based on cache occupancy are provided. In one embodiment, a computer implemented method assigns a thread performance counter to threads being created to measure the number of cache misses for the threads. The thread performance counter is deduced in one embodiment based on performance counters associated with each core in a processor. The method further calculates a self-thread value as the change in the thread performance counter of a given thread during a predetermined period, and an other-thread value as the sum of all the changes in the thread performance counters for all threads except for the given thread. Further, the method estimates a cache occupancy for the given thread based on a previous occupancy for the given thread, and the calculated shelf-thread and other-thread values. The estimated cache occupancy is used to assign computing environment resources to the given thread. In another embodiment, cache miss-rate curves are constructed for a thread to help analyze performance tradeoffs when changing cache allocations of the threads in the system.
摘要:
Methods, computer programs, and systems for managing thread performance in a computing environment based on cache occupancy are provided. In one embodiment, a computer implemented method assigns a thread performance counter to threads being created to measure the number of cache misses for the threads. The thread performance counter is deduced in one embodiment based on performance counters associated with each core in a processor. The method further calculates a self-thread value as the change in the thread performance counter of a given thread during a predetermined period, and an other-thread value as the sum of all the changes in the thread performance counters for all threads except for the given thread. Further, the method estimates a cache occupancy for the given thread based on a previous occupancy for the given thread, and the calculated shelf-thread and other-thread values. The estimated cache occupancy is used to assign computing environment resources to the given thread. In another embodiment, cache miss-rate curves are constructed for a thread to help analyze performance tradeoffs when changing cache allocations of the threads in the system.
摘要:
A thread (or other resource consumer) is compensated for contention for system resources in a computer system having at least one processor core, a last level cache (LLC), and a main memory. In one embodiment, at each descheduling event of the thread following an execution interval, an effective CPU time is determined. The execution interval is a period of time during which the thread is being executed on the central processing unit (CPU) between scheduling events. The effective CPU time is a portion of the execution interval that excludes delays caused by contention for microarchitectural resources, such as time spent repopulating lines from the LLC that were evicted by other threads. The thread may be compensated for microarchitectural contention by increasing its scheduling priority based on the effective CPU time.
摘要:
A method is described for scheduling in an intelligent manner a plurality of threads on a processor having a plurality of cores and a shared last level cache (LLC). In the method, a first and second scenario having a corresponding first and second combination of threads are identified. The cache occupancies of each of the threads for each of the scenarios are predicted. The predicted cache occupancies being a representation of an amount of the LLC that each of the threads would occupy when running with the other threads on the processor according to the particular scenario. One of the scenarios is identified that results in the least objectionable impacts on all threads, the least objectionable impacts taking into account the impact resulting from the predicted cache occupancies. Finally, a scheduling decision is made according to the one of the scenarios that results in the least objectionable impacts.
摘要:
A method is described for scheduling in an intelligent manner a plurality of threads on a processor having a plurality of cores and a shared last level cache (LLC). In the method, a first and second scenario having a corresponding first and second combination of threads are identified. The cache occupancies of each of the threads for each of the scenarios are predicted. The predicted cache occupancies being a representation of an amount of the LLC that each of the threads would occupy when running with the other threads on the processor according to the particular scenario. One of the scenarios is identified that results in the least objectionable impacts on all threads, the least objectionable impacts taking into account the impact resulting from the predicted cache occupancies. Finally, a scheduling decision is made according to the one of the scenarios that results in the least objectionable impacts.
摘要:
A thread (or other resource consumer) is compensated for contention for system resources in a computer system having at least one processor core, a last level cache (LLC), and a main memory. In one embodiment, at each descheduling event of the thread following an execution interval, an effective CPU time is determined. The execution interval is a period of time during which the thread is being executed on the central processing unit (CPU) between scheduling events. The effective CPU time is a portion of the execution interval that excludes delays caused by contention for microarchitectural resources, such as time spent repopulating lines from the LLC that were evicted by other threads. The thread may be compensated for microarchitectural contention by increasing its scheduling priority based on the effective CPU time.