Weighted-region cycle accounting for multi-threaded processor cores
    1.
    发明授权
    Weighted-region cycle accounting for multi-threaded processor cores 失效
    加权区域循环计算多线程处理器内核

    公开(公告)号:US08161493B2

    公开(公告)日:2012-04-17

    申请号:US12173771

    申请日:2008-07-15

    IPC分类号: G06F9/45 G06F9/46

    摘要: An aspect of the present invention improves the accuracy of measuring processor utilization of multi-threaded cores by providing a calibration facility that derives utilization in the context of the overall dynamic operating state of the core by assigning weights to idle threads and assigning weights to run threads, depending on the status of the core. From previous chip designs it has been established in a Simultaneous Multi Thread (SMT) core that not all idle cycles in a hardware thread can be equally converted into useful work. Competition for core resources reduces the conversion efficiency of one thread's idle cycles when any other thread is running on the same core.

    摘要翻译: 本发明的一个方面通过提供一种校准设备来提高测量多线程核心处理器利用率的准确性,该校准设备通过向空闲线程分配权重并为运行线程分配权重而在核心的整体动态操作状态的上下文中获得利用 ,取决于核心的状态。 从先前的芯片设计,已经建立在同步多线程(SMT)核心中,并非硬件线程中的所有空闲周期都可以平等地转换为有用的工作。 核心资源的竞争降低了一个线程在同一个核心上运行的一个线程的空闲周期的转换效率。

    DEVICE FOR AND METHOD OF WEIGHTED-REGION CYCLE ACCOUNTING FOR MULTI-THREADED PROCESSOR CORES
    2.
    发明申请
    DEVICE FOR AND METHOD OF WEIGHTED-REGION CYCLE ACCOUNTING FOR MULTI-THREADED PROCESSOR CORES 失效
    用于多线加工器的加权区域循环会计的装置和方法

    公开(公告)号:US20100287561A1

    公开(公告)日:2010-11-11

    申请号:US12173771

    申请日:2008-07-15

    IPC分类号: G06F9/46

    摘要: An aspect of the present invention improves the accuracy of measuring processor utilization of multi-threaded cores by providing a calibration facility that derives utilization in the context of the overall dynamic operating state of the core by assigning weights to idle threads and assigning weights to run threads, depending on the status of the core. From previous chip designs it has been established in a Simultaneous Multi Thread (SMT) core that not all idle cycles in a hardware thread can be equally converted into useful work. Competition for core resources reduces the conversion efficiency of one thread's idle cycles when any other thread is running on the same core.

    摘要翻译: 本发明的一个方面通过提供一种校准设备来提高测量多线程核心处理器利用率的准确性,该校准设备通过向空闲线程分配权重并为运行线程分配权重而在核心的整体动态操作状态的上下文中获得利用 ,取决于核心的状态。 从先前的芯片设计,已经建立在同步多线程(SMT)核心中,并非硬件线程中的所有空闲周期都可以平等地转换为有用的工作。 核心资源的竞争降低了一个线程在同一个核心上运行的一个线程的空闲周期的转换效率。

    Victim prefetching in a cache hierarchy
    3.
    发明授权
    Victim prefetching in a cache hierarchy 失效
    受害者在缓存层次结构中预取

    公开(公告)号:US07716424B2

    公开(公告)日:2010-05-11

    申请号:US10989997

    申请日:2004-11-16

    IPC分类号: G06F12/08

    摘要: We present a “directory extension” (hereinafter “DX”) to aid in prefetching between proximate levels in a cache hierarchy. The DX may maintain (1) a list of pages which contains recently ejected lines from a given level in the cache hierarchy, and (2) for each page in this list, the identity of a set of ejected lines, provided these lines are prefetchable from, for example, the next level of the cache hierarchy. Given a cache fault to a line within a page in this list, other lines from this page may then be prefetched without the substantial overhead to directory lookup which would otherwise be required.

    摘要翻译: 我们提出一个“目录扩展名”(以下简称“DX”)来辅助缓存层级中的邻近级别之间的预取。 DX可以维护(1)包含最近从缓存层级中的给定级别排出的行的页面列表,以及(2)对于该列表中的每个页面,提供这些行是可预取的集合的标识 从例如缓存层次结构的下一级。 给定列表中页面内的行的高速缓存错误,然后可以预取此页面中的其他行,而不需要大量额外的目录查找开销。

    Efficient region coherence protocol for clustered shared-memory multiprocessor systems
    4.
    发明授权
    Efficient region coherence protocol for clustered shared-memory multiprocessor systems 有权
    用于集群共享内存多处理器系统的高效区域一致性协议

    公开(公告)号:US08397030B2

    公开(公告)日:2013-03-12

    申请号:US12144759

    申请日:2008-06-24

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0833 G06F12/0822

    摘要: A system and method of a region coherence protocol for use in Region Coherence Arrays (RCAs) deployed in clustered shared-memory multiprocessor systems which optimize cache-to-cache transfers by allowing broadcast memory requests to be provided to only a portion of a clustered shared-memory multiprocessor system. Interconnect hierarchy levels can be devised for logical groups of processors, processors on the same chip, processors on chips aggregated into a multichip module, multichip modules on the same printed circuit board, and for processors on other printed circuit boards or in other cabinets. The present region coherence protocol includes, for example, one bit per level of interconnect hierarchy, such that the one bit has a value of “1” to indicate that there may be processors caching copies of lines from the region at that level of the interconnect hierarchy, and the one bit has a value of “0” to indicate that there are no cached copies of any lines from the region at that respective level of the interconnect hierarchy.

    摘要翻译: 区域一致性协议的系统和方法,用于部署在群集共享存储器多处理器系统中的区域相干阵列(RCA),其通过允许广播存储器请求仅提供给集群共享的一部分来优化高速缓存到高速缓存传输 内存多处理器系统。 可以为逻辑组处理器,同一芯片上的处理器,集成到多芯片模块中的芯片上的处理器,同一印刷电路板上的多芯片模块以及其他印刷电路板或其他机柜中的处理器设计互连层级。 当前区域相干协议包括例如每层次的互连层级中的一位,使得一位具有值1以指示可以存在处理器从互连层级的该级别的区域缓存行的副本, 并且一位的值为0,表示在互连层次结构的相应级别的区域中没有任何行的缓存副本。

    Method, apparatus, and computer program product for a cache coherency protocol state that predicts locations of shared memory blocks
    5.
    发明授权
    Method, apparatus, and computer program product for a cache coherency protocol state that predicts locations of shared memory blocks 有权
    用于预测共享存储器块的位置的高速缓存一致性协议状态的方法,装置和计算机程序产品

    公开(公告)号:US07395376B2

    公开(公告)日:2008-07-01

    申请号:US11184315

    申请日:2005-07-19

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is declined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. Memory read requests to read data in a cache line that is not currently in tile shared invalid state are broadcast first to remote nodes. Memory read requests to read data in a cache line that is currently in the shared invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory read requests within the local node, the memory read requests are broadcast to the remote nodes.

    摘要翻译: 公开了用于减少不必要地广播的本地请求的数量以减少从SMP计算机系统中的远程节点访问数据的等待时间的方法,装置和计算机程序产品。 共享的无效高速缓存一致性协议状态被拒绝,该状态预测在本地节点内是否可以满足在共享高速缓存行中读取数据的存储器读取请求。 当高速缓存行处于共享无效状态时,预测数据的有效副本位于本地节点中。 当高速缓存行处于无效状态而不处于共享无效状态时,预测数据的有效副本位于远程节点之一中。 在当前处于瓦片共享无效状态的缓存行中读取数据的内存读取请求首先被广播到远程节点。 在当前处于共享无效状态的高速缓存行中读取数据的存储器读取请求首先被广播到本地节点,并且响应于不能满足本地节点内的存储器读取请求,存储器读取请求被广播到 远程节点。

    Data processing system and method for efficient communication utilizing an in coherency state
    6.
    发明授权
    Data processing system and method for efficient communication utilizing an in coherency state 有权
    用于在一致性状态下有效通信的数据处理系统和方法

    公开(公告)号:US07389388B2

    公开(公告)日:2008-06-17

    申请号:US11055305

    申请日:2005-02-10

    IPC分类号: G06F13/00

    摘要: A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain.

    摘要翻译: 高速缓存一致数据处理系统至少包括第一和第二相关域,每个域包括至少一个处理单元。 第一相关域包括第一高速缓冲存储器,并且第二相干域包括相干第二高速缓冲存储器。 数据处理系统的第一相干域内的第一高速缓冲存储器在与地址标签和一致性状态字段相关联的存储位置中保存存储器块。 相关性状态字段被设置为指示地址标签有效的状态,存储位置不包含有效数据,并且该存储器块可能仅在第一相干域内被缓存。

    Method, apparatus, and computer program product for a cache coherency protocol state that predicts locations of modified memory blocks
    7.
    发明授权
    Method, apparatus, and computer program product for a cache coherency protocol state that predicts locations of modified memory blocks 失效
    用于预测修改的存储器块的位置的高速缓存一致性协议状态的方法,装置和计算机程序产品

    公开(公告)号:US07360032B2

    公开(公告)日:2008-04-15

    申请号:US11184314

    申请日:2005-07-19

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast remote requests to reduce the latency to access data from local nodes and to reduce global traffic in an SMP computer system. A modified invalid cache coherency protocol state is defined that predicts whether a memory access request to read or write data in a cache line can be satisfied within a local node. When a cache line is in the modified invalid state, the only valid copies of the data are predicted to be located in the local node. When a cache line is in the invalid state and not in the modified invalid state, a valid copy of the data is predicted to be located in one of the remote nodes.Memory access requests to read exclusive or write data in a cache line that is not currently in the modified invalid state are broadcast first to all nodes. Memory access requests to read exclusive or write data in a cache line that is currently in the modified invalid state are broadcast first to a local node, and in response to being unable to satisfy the memory access requests within the local node, the memory access requests are broadcast to the remote nodes.

    摘要翻译: 公开了一种方法,装置和计算机程序产品,用于减少不必要地广播的远程请求的数量,以减少从本地节点访问数据的等待时间并减少SMP计算机系统中的全局流量。 定义了修改的无效高速缓存一致性协议状态,其预测在本地节点内是否可以满足在高速缓存行中读取或写入数据的存储器访问请求。 当缓存行处于修改的无效状态时,数据的唯一有效副本被预测位于本地节点中。 当高速缓存行处于无效状态而不处于修改的无效状态时,预测数据的有效副本位于远程节点之一中。 在当前处于修改的无效状态的高速缓存行中读取独占或写入数据的存储器访问请求首先被广播到所有节点。 在当前处于修改的无效状态的高速缓存行中读取独占或写入数据的存储器访问请求首先被广播到本地节点,并且响应于不能满足本地节点内的存储器访问请求,存储器访问请求 广播到远程节点。

    Method and apparatus for implementing cache state as history of read/write shared data
    8.
    发明授权
    Method and apparatus for implementing cache state as history of read/write shared data 失效
    用于实现高速缓存状态作为读/写共享数据的历史的方法和装置

    公开(公告)号:US07194586B2

    公开(公告)日:2007-03-20

    申请号:US10251276

    申请日:2002-09-20

    IPC分类号: G06F12/00

    摘要: A method and apparatus are provided for implementing a cache state as history of read/write shared data for a cache in a shared memory multiple processor computer system. An invalid temporary state for a cache line is provided in addition to modified, exclusive, shared, and invalid states. The invalid temporary state is entered when a cache releases a modified cache line to another processor. The invalid temporary state is used to enable effective optimizations within cache coherent symmetric multiprocessor (SMP) systems of an SMP caching hierarchy with distributed caches with different caching coherency traffic profiles for both commercial and technical workloads.

    摘要翻译: 提供了一种用于将高速缓存状态实现为共享存储器多处理器计算机系统中的高速缓存的读/写共享数据的历史的方法和装置。 除了修改,排除,共享和无效的状态之外,还提供了高速缓存行的无效临时状态。 当缓存将修改的高速缓存行释放到另一个处理器时,输入无效的临时状态。 无效临时状态用于在SMP高速缓存层次结构的高速缓存一致对称多处理器(SMP)系统中进行有效优化,具有针对商业和技术工作负载的不同缓存一致性流量配置文件的分布式缓存。

    Cache prefetching
    9.
    发明授权
    Cache prefetching 失效
    缓存预取

    公开(公告)号:US06922753B2

    公开(公告)日:2005-07-26

    申请号:US10255490

    申请日:2002-09-26

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0862

    摘要: Method and apparatus for prefetching cache with requested data are described. A processor initiates a read access to main memory for data which is not in the main memory. After the requested data is brought into the main memory, but before the read access is reinitiated, the requested data is prefetched from main memory into the cache subsystem of the processor which will later reinitiate the read access.

    摘要翻译: 描述用于预取具有所请求数据的高速缓存的方法和装置。 处理器启动对主存储器的读访问,用于不在主存储器中的数据。 在所请求的数据被带入主存储器之后,但是在读取访问被重新启动之前,所请求的数据从主存储器预取到处理器的高速缓存子系统中,这将稍后重新启动读访问。

    Non-uniform memory access (NUMA) enhancements for shared logical partitions
    10.
    发明授权
    Non-uniform memory access (NUMA) enhancements for shared logical partitions 失效
    共享逻辑分区的非均匀内存访问(NUMA)增强功能

    公开(公告)号:US08490094B2

    公开(公告)日:2013-07-16

    申请号:US12394669

    申请日:2009-02-27

    IPC分类号: G06F9/50 G06F13/00

    CPC分类号: G06F9/5077 G06F2212/2542

    摘要: In a NUMA-topology computer system that includes multiple nodes and multiple logical partitions, some of which may be dedicated and others of which are shared, NUMA optimizations are enabled in shared logical partitions. This is done by specifying a home node parameter in each virtual processor assigned to a logical partition. When a task is created by an operating system in a shared logical partition, a home node is assigned to the task, and the operating system attempts to assign the task to a virtual processor that has a home node that matches the home node for the task. The partition manager then attempts to assign virtual processors to their corresponding home nodes. If this can be done, NUMA optimizations may be performed without the risk of reducing the performance of the shared logical partition.

    摘要翻译: 在包含多个节点和多个逻辑分区的NUMA拓扑计算机系统中,其中一些可能是专用的,其他的可以是共享的,而在共享逻辑分区中启用了NUMA优化。 这是通过在分配给逻辑分区的每个虚拟处理器中指定家庭节点参数来完成的。 当由共享逻辑分区中的操作系统创建任务时,将家庭节点分配给该任务,并且操作系统尝试将该任务分配给具有与该任务的家庭节点匹配的家庭节点的虚拟处理器 。 然后,分区管理器尝试将虚拟处理器分配给其对应的家庭节点。 如果可以这样做,可以执行NUMA优化,而不会降低共享逻辑分区的性能。