Method and apparatus for achieving fair cache sharing on multi-threaded chip multiprocessors
    1.
    发明授权
    Method and apparatus for achieving fair cache sharing on multi-threaded chip multiprocessors 有权
    用于在多线程芯片多处理器上实现公平缓存共享的方法和装置

    公开(公告)号:US08069444B2

    公开(公告)日:2011-11-29

    申请号:US11511804

    申请日:2006-08-29

    IPC分类号: G06F9/46 G06F13/00

    摘要: In a computer system with a multi-core processor having a shared cache memory level, an operating system scheduler adjusts the CPU latency of a thread running on one of the cores to be equal to the fair CPU latency which that thread would experience when the cache memory was equally shared by adjusting the CPU time quantum of the thread. In particular, during a reconnaissance time period, the operating system scheduler gathers information regarding the threads via conventional hardware counters and uses an analytical model to estimate a fair cache miss rate that the thread would experience if the cache memory was equally shared. During a subsequent calibration period, the operating system scheduler computes the fair CPU latency using runtime statistics and the previously computed fair cache miss rate value to determine the fair CPI value.

    摘要翻译: 在具有共享高速缓冲存储器级别的多核处理器的计算机系统中,操作系统调度器调整在一个核上运行的线程的CPU等待时间等于该线程在高速缓存时将遇到的公平CPU等待时间 通过调整线程的CPU时间量,平均分配内存。 特别地,在侦察时段期间,操作系统调度器通过常规硬件计数器收集关于线程的信息,并使用分析模型来估计如果高速缓冲存储器被共享共享的线程将遇到的公平高速缓存未命中率。 在随后的校准周期期间,操作系统调度器使用运行时统计信息和先前计算的公平缓存未命中率值来计算公平的CPU等待时间,以确定公平的CPI值。

    Methods and apparatus for estimating fair cache miss rates on a chip multiprocessor
    2.
    发明授权
    Methods and apparatus for estimating fair cache miss rates on a chip multiprocessor 有权
    用于估计芯片多处理器上的公平缓存未命中率的方法和装置

    公开(公告)号:US07689773B2

    公开(公告)日:2010-03-30

    申请号:US11606736

    申请日:2006-11-30

    IPC分类号: G06F12/00

    摘要: A caching estimator process identifies a thread for determining the fair cache miss rate of the thread. The caching estimator process executes the thread concurrently on the chip multiprocessor with a plurality of peer threads to measure the actual cache miss rates of the respective threads while executing concurrently. Additionally, the caching estimator process computes the fair cache miss rate of the thread based on the relationship between the actual miss rate of the thread and the actual miss rates of the plurality of peer threads. As a result, the caching estimator applies the fair cache miss rate of the thread to a scheduling policy of the chip multiprocessor.

    摘要翻译: 缓存估计器进程识别用于确定线程的公平缓存未命中率的线程。 高速缓存估计器进程使用多个对等线程在芯片多处理器上同时执行线程,以在同时执行的同时测量相应线程的实际高速缓存未命中率。 此外,缓存估计器进程基于线程的实际未命中率与多个对等线程的实际未命中率之间的关系来计算线程的公平缓存未命中率。 结果,缓存估计器将线程的公平缓存未命中率应用于码片多处理器的调度策略。

    Methods and apparatus for scheduling applications on a chip multiprocessor
    3.
    发明申请
    Methods and apparatus for scheduling applications on a chip multiprocessor 有权
    用于在芯片多处理器上调度应用程序的方法和装置

    公开(公告)号:US20080134185A1

    公开(公告)日:2008-06-05

    申请号:US11606751

    申请日:2006-11-30

    IPC分类号: G06F9/46

    摘要: A thread scheduler identifies a thread operable to be scheduled by a scheduling policy for execution on the chip multiprocessor. The thread scheduler estimates, for the thread, a performance value that is based on runtime statistics of the thread for a shared resource on the chip multiprocessor. Additionally, the thread scheduler applies the performance value to the scheduling policy in order to reallocate processor time of the thread commensurate with the performance value under fair distribution of the shared resource on the chip multiprocessor. The thread scheduler also applies the performance value to the scheduling policy in order to reallocate processor time of at least one co-executing thread to compensate for the reallocation of processor time to the thread.

    摘要翻译: 线程调度器识别可由调度策略调度以在芯片多处理器上执行的线程。 线程调度器针对线程估计基于芯片多处理器上的共享资源的线程的运行时统计信息的性能值。 此外,线程调度器将性能值应用于调度策略,以便在芯片多处理器上共享资源的公平分配下重新分配与线程相匹配的性能值的处理器时间。 线程调度器还将性能值应用于调度策略,以便重新分配至少一个共同执行线程的处理器时间,以补偿对线程的处理器时间的重新分配。

    Cache-aware thread scheduling in multi-threaded systems
    4.
    发明授权
    Cache-aware thread scheduling in multi-threaded systems 有权
    多线程系统中的缓存感知线程调度

    公开(公告)号:US08533719B2

    公开(公告)日:2013-09-10

    申请号:US12754143

    申请日:2010-04-05

    IPC分类号: G06F9/46 G06F15/00 G06F13/00

    CPC分类号: G06F9/5033 Y02D10/22

    摘要: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.

    摘要翻译: 所公开的实施例提供了一种便于在具有多个处理器核心的多线程处理器中调度线程的系统。 在操作期间,系统执行与共享高速缓存相关联的处理器核心中的第一线程。 在此执行期间,系统测量一个或多个度量来表征第一个线程。 然后,系统使用第一线程的表征和第二线程的表征来预测如果第二线程同时在也与高速缓存相关联的第二处理器核心中执行的性能影响。 如果预测的性能影响指示在第二处理器核心上执行第二线程将提高多线程处理器的性能,则系统执行第二处理器核心上的第二线程。

    MODULAR RE-CONFIGURABLE PROFILING CORE FOR MULTIPROCESSOR SYSTEMS-ON-CHIP
    5.
    发明申请
    MODULAR RE-CONFIGURABLE PROFILING CORE FOR MULTIPROCESSOR SYSTEMS-ON-CHIP 有权
    用于多处理器系统的模块化可配置配置核心

    公开(公告)号:US20120022832A1

    公开(公告)日:2012-01-26

    申请号:US12916413

    申请日:2010-10-29

    IPC分类号: G06F15/00

    摘要: A modular dynamically re-configurable profiling core may be used to provide both operating systems and applications with detailed information about run time performance bottlenecks and may enable them to address these bottlenecks via scheduling or dynamic compilation. As a result, application software may be able to better leverage the intrinsic nature of the multi-core hardware platform, be it homogeneous or heterogeneous. The profiling functionality may be desirably isolated on a discrete, separate and modular profiling core, which may be referred to as a configurable profiler (CP). The modular configurable profiling core may facilitate inclusion of rich profiling functionality into new processors via modular reuse of the inventive CP. The modular configurable profiling core may improve a customer's experience and productivity when used in conjunction with commercial multi-core processors.

    摘要翻译: 可以使用模块化的动态重新配置的分析核心来为操作系统和应用程序提供关于运行时性能瓶颈的详细信息,并且可以使它们能够通过调度或动态编译来解决这些瓶颈。 因此,应用软件可能能够更好地利用多核硬件平台的内在特性,无论是同质还是异构。 分析功能可以理想地在离散的,分离的和模块化的分析核心上隔离,其可以被称为可配置轮廓仪(CP)。 模块化可配置分析核心可以通过本发明的CP的模块化重用来促进将新的分析功能包括在新的处理器中。 当与商业多核处理器结合使用时,模块化可配置分析核心可以提高客户的体验和生产力。

    Modular re-configurable profiling core for multiprocessor systems-on-chip
    6.
    发明授权
    Modular re-configurable profiling core for multiprocessor systems-on-chip 有权
    用于片上多处理器系统的模块化可重配置分析核心

    公开(公告)号:US08818760B2

    公开(公告)日:2014-08-26

    申请号:US12916413

    申请日:2010-10-29

    摘要: A modular dynamically re-configurable profiling core may be used to provide both operating systems and applications with detailed information about run time performance bottlenecks and may enable them to address these bottlenecks via scheduling or dynamic compilation. As a result, application software may be able to better leverage the intrinsic nature of the multi-core hardware platform, be it homogeneous or heterogeneous. The profiling functionality may be desirably isolated on a discrete, separate and modular profiling core, which may be referred to as a configurable profiler (CP). The modular configurable profiling core may facilitate inclusion of rich profiling functionality into new processors via modular reuse of the inventive CP. The modular configurable profiling core may improve a customer's experience and productivity when used in conjunction with commercial multi-core processors.

    摘要翻译: 可以使用模块化的动态重新配置的分析核心来为操作系统和应用程序提供关于运行时性能瓶颈的详细信息,并且可以使它们能够通过调度或动态编译来解决这些瓶颈。 因此,应用软件可能能够更好地利用多核硬件平台的内在特性,无论是同质还是异构。 分析功能可以理想地在离散的,分离的和模块化的分析核心上隔离,其可以被称为可配置轮廓仪(CP)。 模块化可配置分析核心可以通过本发明的CP的模块化重用来促进将新的分析功能包括在新的处理器中。 当与商业多核处理器结合使用时,模块化可配置分析核心可以提高客户的体验和生产力。

    CACHE-AWARE THREAD SCHEDULING IN MULTI-THREADED SYSTEMS
    7.
    发明申请
    CACHE-AWARE THREAD SCHEDULING IN MULTI-THREADED SYSTEMS 有权
    多线程系统中的CACHE-AWARE THREAD SCHEDULING

    公开(公告)号:US20110246995A1

    公开(公告)日:2011-10-06

    申请号:US12754143

    申请日:2010-04-05

    IPC分类号: G06F9/46 G06F9/30

    CPC分类号: G06F9/5033 Y02D10/22

    摘要: The disclosed embodiments provide a system that facilitates scheduling threads in a multi-threaded processor with multiple processor cores. During operation, the system executes a first thread in a processor core that is associated with a shared cache. During this execution, the system measures one or more metrics to characterize the first thread. Then, the system uses the characterization of the first thread and a characterization for a second, second thread to predict a performance impact that would occur if the second thread were to simultaneously execute in a second processor core that is also associated with the cache. If the predicted performance impact indicates that executing the second thread on the second processor core will improve performance for the multi-threaded processor, the system executes the second thread on the second processor core.

    摘要翻译: 所公开的实施例提供了一种便于在具有多个处理器核心的多线程处理器中调度线程的系统。 在操作期间,系统执行与共享高速缓存相关联的处理器核心中的第一线程。 在此执行期间,系统测量一个或多个度量来表征第一个线程。 然后,系统使用第一线程的表征和第二线程的表征,以预测如果第二线程在与高速缓存相关联的第二处理器核心中同时执行将会发生的性能影响。 如果预测的性能影响指示在第二处理器核心上执行第二线程将提高多线程处理器的性能,则系统执行第二处理器核心上的第二线程。

    Cache-aware scheduling for a chip multithreading processor
    8.
    发明授权
    Cache-aware scheduling for a chip multithreading processor 有权
    针对芯片多线程处理器的缓存感知调度

    公开(公告)号:US07818747B1

    公开(公告)日:2010-10-19

    申请号:US11265814

    申请日:2005-11-03

    CPC分类号: G06F9/4881 G06F12/084

    摘要: A chip multithreading processor schedules and assigns threads to its processing cores dependent on estimated miss rates in a shared cache memory of the threads. A cache miss rate of a thread is estimated by measuring cache miss rates of one or more groups of executing threads, where at least one of the groups includes the thread of interest. Using a determined estimated cache miss rate of the thread, the thread is scheduled with other threads to achieve a relatively low cache miss rate in the shared cache memory.

    摘要翻译: 芯片多线程处理器根据线程的共享高速缓冲存储器中的估计未命中率来调度和分配线程到其处理核心。 通过测量一组或多组执行线程的高速缓存未命中率来估计线程的高速缓存未命中率,其中至少一个组包括感兴趣的线程。 使用确定的线程的估计高速缓存未命中率,线程与其他线程调度以在共享高速缓冲存储器中实现相对较低的高速缓存未命中率。

    Cache-aware scheduling for a chip multithreading processor
    9.
    发明授权
    Cache-aware scheduling for a chip multithreading processor 有权
    针对芯片多线程处理器的缓存感知调度

    公开(公告)号:US07487317B1

    公开(公告)日:2009-02-03

    申请号:US11265956

    申请日:2005-11-03

    IPC分类号: G06F12/08 G06F9/46

    摘要: A chip multithreading processor schedules and assigns threads to its processing cores dependent on estimated miss rates in a shared cache memory of the threads. A cache miss rate of a thread is estimated by measuring cache miss rates of one or more groups of executing threads, where at least one of the groups includes the thread of interest. Using a determined estimated cache miss rate of the thread, the thread is scheduled with other threads to achieve a relatively low cache miss rate in the shared cache memory.

    摘要翻译: 芯片多线程处理器根据线程的共享高速缓冲存储器中的估计未命中率来调度和分配线程到其处理核心。 通过测量一组或多组执行线程的高速缓存未命中率来估计线程的高速缓存未命中率,其中至少一个组包括感兴趣的线程。 使用确定的线程的估计高速缓存未命中率,线程与其他线程调度以在共享高速缓冲存储器中实现相对较低的高速缓存未命中率。

    Method and apparatus for estimating the effect of processor cache memory bus delays on multithreaded processor throughput
    10.
    发明授权
    Method and apparatus for estimating the effect of processor cache memory bus delays on multithreaded processor throughput 有权
    用于估计处理器高速缓存存储器总线延迟对多线程处理器吞吐量的影响的方法和装置

    公开(公告)号:US07457931B1

    公开(公告)日:2008-11-25

    申请号:US11141775

    申请日:2005-06-01

    IPC分类号: G06F13/16

    摘要: An estimate of the throughput of a multi-threaded processor based on measured miss rates of a cache memory associated with the processor is adjusted to account for cache miss processing delays due to memory bus access contention. In particular, the throughput calculated from the cache memory miss rates is initially calculated assuming that a memory bus between the cache memory and main memory has infinite bandwidth, this throughput estimate is used to estimate a request cycle time between memory access attempts for a typical thread. The request cycle time, in turn, is used to determine a memory bus access delay that is then used to adjust the initial processor throughput estimate. The adjusted estimate can be used for thread scheduling in a multiprocessor system.

    摘要翻译: 基于与处理器相关联的高速缓冲存储器的测量未命中率来调整多线程处理器的吞吐量的估计,以考虑由于存储器总线访问争用引起的高速缓存未命中处理延迟。 特别地,假定高速缓冲存储器和主存储器之间的存储器总线具有无限带宽,则首先计算从高速缓冲存储器未命中率计算的吞吐量,该吞吐量估计用于估计典型线程的存储器访问尝试之间的请求周期时间 。 然后,请求周期时间用于确定存储器总线访问延迟,然后用于调整初始处理器吞吐量估计。 调整后的估计可用于多处理器系统中的线程调度。