摘要:
In a computer system with a multi-core processor having a shared cache memory level, an operating system scheduler adjusts the CPU latency of a thread running on one of the cores to be equal to the fair CPU latency which that thread would experience when the cache memory was equally shared by adjusting the CPU time quantum of the thread. In particular, during a reconnaissance time period, the operating system scheduler gathers information regarding the threads via conventional hardware counters and uses an analytical model to estimate a fair cache miss rate that the thread would experience if the cache memory was equally shared. During a subsequent calibration period, the operating system scheduler computes the fair CPU latency using runtime statistics and the previously computed fair cache miss rate value to determine the fair CPI value.
摘要:
A caching estimator process identifies a thread for determining the fair cache miss rate of the thread. The caching estimator process executes the thread concurrently on the chip multiprocessor with a plurality of peer threads to measure the actual cache miss rates of the respective threads while executing concurrently. Additionally, the caching estimator process computes the fair cache miss rate of the thread based on the relationship between the actual miss rate of the thread and the actual miss rates of the plurality of peer threads. As a result, the caching estimator applies the fair cache miss rate of the thread to a scheduling policy of the chip multiprocessor.
摘要:
A thread scheduler identifies a thread operable to be scheduled by a scheduling policy for execution on the chip multiprocessor. The thread scheduler estimates, for the thread, a performance value that is based on runtime statistics of the thread for a shared resource on the chip multiprocessor. Additionally, the thread scheduler applies the performance value to the scheduling policy in order to reallocate processor time of the thread commensurate with the performance value under fair distribution of the shared resource on the chip multiprocessor. The thread scheduler also applies the performance value to the scheduling policy in order to reallocate processor time of at least one co-executing thread to compensate for the reallocation of processor time to the thread.
摘要:
The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.
摘要:
A modular dynamically re-configurable profiling core may be used to provide both operating systems and applications with detailed information about run time performance bottlenecks and may enable them to address these bottlenecks via scheduling or dynamic compilation. As a result, application software may be able to better leverage the intrinsic nature of the multi-core hardware platform, be it homogeneous or heterogeneous. The profiling functionality may be desirably isolated on a discrete, separate and modular profiling core, which may be referred to as a configurable profiler (CP). The modular configurable profiling core may facilitate inclusion of rich profiling functionality into new processors via modular reuse of the inventive CP. The modular configurable profiling core may improve a customer's experience and productivity when used in conjunction with commercial multi-core processors.
摘要:
A modular dynamically re-configurable profiling core may be used to provide both operating systems and applications with detailed information about run time performance bottlenecks and may enable them to address these bottlenecks via scheduling or dynamic compilation. As a result, application software may be able to better leverage the intrinsic nature of the multi-core hardware platform, be it homogeneous or heterogeneous. The profiling functionality may be desirably isolated on a discrete, separate and modular profiling core, which may be referred to as a configurable profiler (CP). The modular configurable profiling core may facilitate inclusion of rich profiling functionality into new processors via modular reuse of the inventive CP. The modular configurable profiling core may improve a customer's experience and productivity when used in conjunction with commercial multi-core processors.
摘要:
The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.
摘要:
A chip multithreading processor schedules and assigns threads to its processing cores dependent on estimated miss rates in a shared cache memory of the threads. A cache miss rate of a thread is estimated by measuring cache miss rates of one or more groups of executing threads, where at least one of the groups includes the thread of interest. Using a determined estimated cache miss rate of the thread, the thread is scheduled with other threads to achieve a relatively low cache miss rate in the shared cache memory.
摘要:
A chip multithreading processor schedules and assigns threads to its processing cores dependent on estimated miss rates in a shared cache memory of the threads. A cache miss rate of a thread is estimated by measuring cache miss rates of one or more groups of executing threads, where at least one of the groups includes the thread of interest. Using a determined estimated cache miss rate of the thread, the thread is scheduled with other threads to achieve a relatively low cache miss rate in the shared cache memory.
摘要:
An estimate of the throughput of a multi-threaded processor based on measured miss rates of a cache memory associated with the processor is adjusted to account for cache miss processing delays due to memory bus access contention. In particular, the throughput calculated from the cache memory miss rates is initially calculated assuming that a memory bus between the cache memory and main memory has infinite bandwidth, this throughput estimate is used to estimate a request cycle time between memory access attempts for a typical thread. The request cycle time, in turn, is used to determine a memory bus access delay that is then used to adjust the initial processor throughput estimate. The adjusted estimate can be used for thread scheduling in a multiprocessor system.