Notification by Task of Completion of GSM Operations at Target Node
    61.
    发明申请
    Notification by Task of Completion of GSM Operations at Target Node 有权
    目标节点GSM业务完成任务通知

    公开(公告)号:US20090199182A1

    公开(公告)日:2009-08-06

    申请号:US12024651

    申请日:2008-02-01

    IPC分类号: G06F9/46

    CPC分类号: G06F9/544 G06F9/542

    摘要: A method for providing global notification of completion of a global shared memory (GSM) operation during processing by a target task executing at a target node of a distributed system. The distributed system has at least one other node on which an initiating task that generated the GSM operation is homed. The target task receives the GSM operation from the initiating task, via a host fabric interface (HFI) window assigned to the target task. The task initiates execution of the GSM operation on the target node. The task detects completion of the execution of the GSM operation on the target node, and issues a global notification to at least the initiating task. The global notification indicates the completion of the execution of the GSM operation to one or more tasks of a single job distributed across multiple processing nodes.

    摘要翻译: 一种用于在由分布式系统的目标节点执行的目标任务的处理期间提供全局共享存储器(GSM)完成的全局通知的方法。 分布式系统具有至少一个其他节点,其上产生GSM操作的发起任务被归位。 目标任务通过分配给目标任务的主机结构接口(HFI)窗口从发起任务接收GSM操作。 该任务启动目标节点上的GSM操作的执行。 该任务检测目标节点上的GSM操作的执行完成,并向至少发起任务发出全局通知。 全局通知指示完成对多个处理节点分配的单个作业的一个或多个任务的GSM操作的执行。

    Data replication in multiprocessor NUCA systems to reduce horizontal cache thrashing
    62.
    发明申请
    Data replication in multiprocessor NUCA systems to reduce horizontal cache thrashing 失效
    多处理器NUCA系统中的数据复制,以减少水平缓存的颠簸

    公开(公告)号:US20060080506A1

    公开(公告)日:2006-04-13

    申请号:US10960611

    申请日:2004-10-07

    IPC分类号: G06F12/00

    摘要: A method of managing a distributed cache structure having separate cache banks, by detecting that a given cache line has been repeatedly accessed by two or more processors which share the cache, and replicating that cache line in at least two separate cache banks. The cache line is optimally replicated in a cache bank having the lowest latency with respect to the given accessing processor. A currently accessed line in a different cache bank can be exchanged with a cache line in the cache bank with the lowest latency, and another line in the cache bank with lowest latency is moved to the different cache bank prior to the currently accessed line being moved to the cache bank with the lowest latency. Further replication of the cache line can be disabled when two or more processors alternately write to the cache line.

    摘要翻译: 一种通过检测给定的高速缓存行已被共享高速缓存的两个或多个处理器重复访问并且在至少两个单独的高速缓冲存储器中复制该高速缓存行的方式来管理具有单独的高速缓存组的分布式高速缓存结构的方法。 高速缓存行被优化地复制到相对于给定访问处理器具有最低延迟的缓存组中。 在不同的缓存组中的当前访问的行可以与具有最低延迟的高速缓存组中的高速缓存行交换,并且具有最低延迟的高速缓存组中的另一行在当前访问的行被移动之前移动到不同的高速缓存组 以最低的延迟到达缓存库。 当两个或更多个处理器交替写入高速缓存行时,可以禁用高速缓存行的进一步复制。

    ASSIST THREAD FOR INJECTING CACHE MEMORY IN A MICROPROCESSOR
    63.
    发明申请
    ASSIST THREAD FOR INJECTING CACHE MEMORY IN A MICROPROCESSOR 有权
    在微处理器中注入高速缓存存储器的辅助螺纹

    公开(公告)号:US20120198459A1

    公开(公告)日:2012-08-02

    申请号:US13434423

    申请日:2012-03-29

    IPC分类号: G06F9/46 G06F12/08

    摘要: A data processing system includes a microprocessor having access to multiple levels of cache memories. The microprocessor executes a main thread compiled from a source code object. The system includes a processor for executing an assist thread also derived from the source code object. The assist thread includes memory reference instructions of the main thread and only those arithmetic instructions required to resolve the memory reference instructions. A scheduler configured to schedule the assist thread in conjunction with the corresponding execution thread is configured to execute the assist thread ahead of the execution thread by a determinable threshold such as the number of main processor cycles or the number of code instructions. The assist thread may execute in the main processor or in a dedicated assist processor that makes direct memory accesses to one of the lower level cache memory elements.

    摘要翻译: 数据处理系统包括具有访问多级缓存存储器的微处理器。 微处理器执行从源代码对象编译的主线程。 该系统包括用于执行也源自源代码对象的辅助线程的处理器。 辅助线程包括主线程的存储器参考指令和仅解析存储器参考指令所需的算术指令。 配置成与对应的执行线程一起调度辅助线程的调度器被配置为通过诸如主处理器周期的数量或代码指令的数量的可确定的阈值来执行执行线程之前的辅助线程。 辅助线程可以在主处理器或专用辅助处理器中执行,该处理器直接对下一级高速缓冲存储器元件之一进行存储器访问。

    Assist thread for injecting cache memory in a microprocessor
    64.
    发明授权
    Assist thread for injecting cache memory in a microprocessor 有权
    协助在微处理器中注入高速缓存的线程

    公开(公告)号:US08230422B2

    公开(公告)日:2012-07-24

    申请号:US11034546

    申请日:2005-01-13

    IPC分类号: G06F9/46 G06F9/40 G06F13/28

    摘要: A data processing system includes a microprocessor having access to multiple levels of cache memories. The microprocessor executes a main thread compiled from a source code object. The system includes a processor for executing an assist thread also derived from the source code object. The assist thread includes memory reference instructions of the main thread and only those arithmetic instructions required to resolve the memory reference instructions. A scheduler configured to schedule the assist thread in conjunction with the corresponding execution thread is configured to execute the assist thread ahead of the execution thread by a determinable threshold such as the number of main processor cycles or the number of code instructions. The assist thread may execute in the main processor or in a dedicated assist processor that makes direct memory accesses to one of the lower level cache memory elements.

    摘要翻译: 数据处理系统包括具有访问多级缓存存储器的微处理器。 微处理器执行从源代码对象编译的主线程。 该系统包括用于执行也源自源代码对象的辅助线程的处理器。 辅助线程包括主线程的存储器参考指令和仅解析存储器参考指令所需的算术指令。 配置成与对应的执行线程一起调度辅助线程的调度器被配置为通过诸如主处理器周期的数量或代码指令的数量的可确定的阈值来执行执行线程之前的辅助线程。 辅助线程可以在主处理器或专用辅助处理器中执行,该处理器直接对下一级高速缓冲存储器元件之一进行存储器访问。

    Assist thread for injecting cache memory in a microprocessor
    67.
    发明授权
    Assist thread for injecting cache memory in a microprocessor 有权
    协助在微处理器中注入高速缓存的线程

    公开(公告)号:US08949837B2

    公开(公告)日:2015-02-03

    申请号:US13434423

    申请日:2012-03-29

    摘要: A data processing system includes a microprocessor having access to multiple levels of cache memories. The microprocessor executes a main thread compiled from a source code object. The system includes a processor for executing an assist thread also derived from the source code object. The assist thread includes memory reference instructions of the main thread and only those arithmetic instructions required to resolve the memory reference instructions. A scheduler configured to schedule the assist thread in conjunction with the corresponding execution thread is configured to execute the assist thread ahead of the execution thread by a determinable threshold such as the number of main processor cycles or the number of code instructions. The assist thread may execute in the main processor or in a dedicated assist processor that makes direct memory accesses to one of the lower level cache memory elements.

    摘要翻译: 数据处理系统包括具有访问多级缓存存储器的微处理器。 微处理器执行从源代码对象编译的主线程。 该系统包括用于执行也源自源代码对象的辅助线程的处理器。 辅助线程包括主线程的存储器参考指令和仅解析存储器参考指令所需的算术指令。 配置成与对应的执行线程一起调度辅助线程的调度器被配置为通过诸如主处理器周期的数量或代码指令的数量的可确定的阈值来执行执行线程之前的辅助线程。 辅助线程可以在主处理器或专用辅助处理器中执行,该处理器直接对下一级高速缓冲存储器元件之一进行存储器访问。

    Data replication in multiprocessor NUCA systems to reduce horizontal cache thrashing
    68.
    发明授权
    Data replication in multiprocessor NUCA systems to reduce horizontal cache thrashing 失效
    多处理器NUCA系统中的数据复制,以减少水平缓存的颠簸

    公开(公告)号:US07287122B2

    公开(公告)日:2007-10-23

    申请号:US10960611

    申请日:2004-10-07

    IPC分类号: G06F15/163 G06F13/36

    摘要: A method of managing a distributed cache structure having separate cache banks, by detecting that a given cache line has been repeatedly accessed by two or more processors which share the cache, and replicating that cache line in at least two separate cache banks. The cache line is optimally replicated in a cache bank having the lowest latency with respect to the given accessing processor. A currently accessed line in a different cache bank can be exchanged with a cache line in the cache bank with the lowest latency, and another line in the cache bank with lowest latency is moved to the different cache bank prior to the currently accessed line being moved to the cache bank with the lowest latency. Further replication of the cache line can be disabled when two or more processors alternately write to the cache line.

    摘要翻译: 一种通过检测给定的高速缓存行已被共享高速缓存的两个或多个处理器重复访问并且在至少两个单独的高速缓冲存储器中复制该高速缓存行的方式来管理具有单独的高速缓存组的分布式高速缓存结构的方法。 高速缓存行被优化地复制到相对于给定访问处理器具有最低延迟的缓存组中。 在不同的缓存组中的当前访问的行可以与具有最低延迟的高速缓存组中的高速缓存行交换,并且具有最低延迟的高速缓存组中的另一行在当前访问的行被移动之前移动到不同的高速缓存组 以最低的延迟到达缓存库。 当两个或更多个处理器交替写入高速缓存行时,可以禁用高速缓存行的进一步复制。

    Method and system for managing cache injection in a multiprocessor system
    70.
    发明申请
    Method and system for managing cache injection in a multiprocessor system 有权
    在多处理器系统中管理缓存注入的方法和系统

    公开(公告)号:US20060064518A1

    公开(公告)日:2006-03-23

    申请号:US10948407

    申请日:2004-09-23

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: A method and apparatus for managing cache injection in a multiprocessor system reduces processing time associated with direct memory access transfers in a symmetrical multiprocessor (SMP) or a non-uniform memory access (NUMA) multiprocessor environment. The method and apparatus either detect the target processor for DMA completion or direct processing of DMA completion to a particular processor, thereby enabling cache injection to a cache that is coupled with processor that executes the DMA completion routine processing the data injected into the cache. The target processor may be identified by determining the processor handling the interrupt that occurs on completion of the DMA transfer. Alternatively or in conjunction with target processor identification, an interrupt handler may queue a deferred procedure call to the target processor to process the transferred data. In NUMA multiprocessor systems, the completing processor/target memory is chosen for accessibility of the target memory to the processor and associated cache.

    摘要翻译: 用于管理多处理器系统中的高速缓存注入的方法和装置减少与对称多处理器(SMP)或非均匀存储器访问(NUMA)多处理器环境中的直接存储器访问传输相关联的处理时间。 该方法和装置可以检测目标处理器用于DMA完成或直接处理DMA完成到特定处理器,从而使高速缓存注入与执行DMA完成例程的处理器处理注入高速缓存的数据的处理器相连的高速缓存。 可以通过确定处理器处理在DMA传输完成时发生的中断来识别目标处理器。 或者或与目标处理器识别结合,中断处理程序可以将延迟过程调用排队到目标处理器以处理传送的数据。 在NUMA多处理器系统中,选择完成的处理器/目标存储器,以便可访问目标存储器到处理器和相关联的高速缓存。