Notification to Task of Completion of GSM Operations by Initiator Node
    1.
    发明申请
    Notification to Task of Completion of GSM Operations by Initiator Node 有权
    通知发起方节点完成GSM操作任务

    公开(公告)号:US20090199191A1

    公开(公告)日:2009-08-06

    申请号:US12024427

    申请日:2008-02-01

    IPC分类号: G06F9/46

    CPC分类号: G06F9/542

    摘要: In a global shared memory (GSM) environment, a method provides local notification of completion of a global shared memory (GSM) operation processed by a first task executing at a local node of the distributed system. The system includes multiple nodes on which different tasks of a single job execute and perform GSM operations that are received from a second task via a via host fabric interface (HFI) and associated HFR window assigned to the first tasks. The local task initiates execution of a GSM operation on the local node. The task then monitors for and detects a completion of the execution of the GSM operation on the local node. When the task detects completion of the execution of the GSM operation, the task issues an internal notification to inform the locally-executing tasks of the completion of the GSM operation.

    摘要翻译: 在全球共享存储器(GSM)环境中,一种方法提供由在分布式系统的本地节点执行的第一任务处理的全局共享存储器(GSM)操作的完成的本地通知。 该系统包括多个节点,单个作业的不同任务在其上执行并执行经由主机结构接口(HFI)从第二任务接收的GSM操作和分配给第一任​​务的相关联的HFR窗口。 本地任务在本地节点上启动执行GSM操作。 然后,该任务监视并检测在本地节点上完成GSM操作的执行。 当任务检测到GSM操作的执行完成时,任务发出内部通知以通知本地执行的完成GSM操作的任务。

    Notification by Task of Completion of GSM Operations at Target Node
    2.
    发明申请
    Notification by Task of Completion of GSM Operations at Target Node 有权
    目标节点GSM业务完成任务通知

    公开(公告)号:US20090199182A1

    公开(公告)日:2009-08-06

    申请号:US12024651

    申请日:2008-02-01

    IPC分类号: G06F9/46

    CPC分类号: G06F9/544 G06F9/542

    摘要: A method for providing global notification of completion of a global shared memory (GSM) operation during processing by a target task executing at a target node of a distributed system. The distributed system has at least one other node on which an initiating task that generated the GSM operation is homed. The target task receives the GSM operation from the initiating task, via a host fabric interface (HFI) window assigned to the target task. The task initiates execution of the GSM operation on the target node. The task detects completion of the execution of the GSM operation on the target node, and issues a global notification to at least the initiating task. The global notification indicates the completion of the execution of the GSM operation to one or more tasks of a single job distributed across multiple processing nodes.

    摘要翻译: 一种用于在由分布式系统的目标节点执行的目标任务的处理期间提供全局共享存储器(GSM)完成的全局通知的方法。 分布式系统具有至少一个其他节点,其上产生GSM操作的发起任务被归位。 目标任务通过分配给目标任务的主机结构接口(HFI)窗口从发起任务接收GSM操作。 该任务启动目标节点上的GSM操作的执行。 该任务检测目标节点上的GSM操作的执行完成,并向至少发起任务发出全局通知。 全局通知指示完成对多个处理节点分配的单个作业的一个或多个任务的GSM操作的执行。

    Notification to task of completion of GSM operations by initiator node
    3.
    发明授权
    Notification to task of completion of GSM operations by initiator node 有权
    通知发起方节点完成GSM操作任务

    公开(公告)号:US08255913B2

    公开(公告)日:2012-08-28

    申请号:US12024427

    申请日:2008-02-01

    IPC分类号: G06F9/46

    CPC分类号: G06F9/542

    摘要: In a global shared memory (GSM) environment, a method provides local notification of completion of a global shared memory (GSM) operation processed by a first task executing at a local node of the distributed system. The system includes multiple nodes on which different tasks of a single job execute and perform GSM operations that are received from a second task via a via host fabric interface (HFI) and associated HFR window assigned to the first tasks. The local task initiates execution of a GSM operation on the local node. The task then monitors for and detects a completion of the execution of the GSM operation on the local node. When the task detects completion of the execution of the GSM operation, the task issues an internal notification to inform the locally-executing tasks of the completion of the GSM operation.

    摘要翻译: 在全球共享存储器(GSM)环境中,一种方法提供由在分布式系统的本地节点执行的第一任务处理的全局共享存储器(GSM)操作的完成的本地通知。 该系统包括多个节点,单个作业的不同任务在其上执行并执行经由主机结构接口(HFI)从第二任务接收的GSM操作和分配给第一任​​务的相关HFR窗口的GSM操作。 本地任务在本地节点上启动执行GSM操作。 然后,该任务监视并检测在本地节点上完成GSM操作的执行。 当任务检测到GSM操作的执行完成时,任务发出内部通知以通知本地执行的完成GSM操作的任务。

    Notification by task of completion of GSM operations at target node
    4.
    发明授权
    Notification by task of completion of GSM operations at target node 有权
    目标节点完成GSM操作任务通知

    公开(公告)号:US08239879B2

    公开(公告)日:2012-08-07

    申请号:US12024651

    申请日:2008-02-01

    CPC分类号: G06F9/544 G06F9/542

    摘要: A method for providing global notification of completion of a global shared memory (GSM) operation during processing by a target task executing at a target node of a distributed system. The distributed system has at least one other node on which an initiating task that generated the GSM operation is homed. The target task receives the GSM operation from the initiating task, via a host fabric interface (HFI) window assigned to the target task. The task initiates execution of the GSM operation on the target node. The task detects completion of the execution of the GSM operation on the target node, and issues a global notification to at least the initiating task. The global notification indicates the completion of the execution of the GSM operation to one or more tasks of a single job distributed across multiple processing nodes.

    摘要翻译: 一种用于在由分布式系统的目标节点执行的目标任务的处理期间提供全局共享存储器(GSM)完成的全局通知的方法。 分布式系统具有至少一个其他节点,其上产生GSM操作的发起任务被归位。 目标任务通过分配给目标任务的主机结构接口(HFI)窗口从发起任务接收GSM操作。 该任务启动目标节点上的GSM操作的执行。 该任务检测目标节点上的GSM操作的执行完成,并向至少发起任务发出全局通知。 全局通知指示完成对多个处理节点分配的单个作业的一个或多个任务的GSM操作的执行。

    Guaranteeing delivery of multi-packet GSM messages
    5.
    发明授权
    Guaranteeing delivery of multi-packet GSM messages 失效
    保证多分组GSM消息的传送

    公开(公告)号:US08146094B2

    公开(公告)日:2012-03-27

    申请号:US12024678

    申请日:2008-02-01

    CPC分类号: H04L1/1642 G06F9/542

    摘要: A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header.

    摘要翻译: 目标任务确保从始发任务到目标任务的全局共享存储器(GSM)消息的完全传递。 目标任务的HFI接收从发起任务发送的单个GSM消息产生的多个GSM分组中的第一个。 HFI逻辑分配序列号和对应的元组来跟踪完整GSM消息的接收。 相对于分配给尚未完全从发起任务接收的GSM消息的其他序列号,序列号是唯一的。 HFI更新元组内的计数值,其包括第一GSM分组的序列号和计数值以及为GSM消息接收的每个后续GSM分组。 通过将计数值与从分组报头检索的计数总数进行比较,HFI确定接收到GSM消息的完成。

    Mechanism to Perform Debugging of Global Shared Memory (GSM) Operations
    6.
    发明申请
    Mechanism to Perform Debugging of Global Shared Memory (GSM) Operations 失效
    执行全局共享内存(GSM)操作调试的机制

    公开(公告)号:US20090199046A1

    公开(公告)日:2009-08-06

    申请号:US12024585

    申请日:2008-02-01

    IPC分类号: G06F11/00

    CPC分类号: G06F13/385

    摘要: A host fabric interface (HFI) enables debugging of global shared memory (GSM) operations received at a local node from a network fabric. The local node has a memory management unit (MMU), which provides an effective address to real address (EA-to-RA) translation table that is utilized by the HFI to evaluate when EAs of GSM operations/data from a received GSM packet is memory-mapped to RAs of the local memory. The HFI retrieves the EA associated with a GSM operation/data within a received GSM packet. The HFI forwards the EA to the MMU, which determines when the EA is mapped to RAs within the local memory for the local task. The HFI processing logic enables processing of the GSM packet only when the EA of the GSM operation/data within the GSM packet is an EA that has a local RA translation. Non-matching EAs result in an error condition that requires debugging.

    摘要翻译: 主机结构接口(HFI)可以调试从网络结构在本地节点接收到的全局共享存储器(GSM)操作。 本地节点具有存储器管理单元(MMU),该存储器管理单元(MMU)为HFI用于实际地址(EA-to-RA)转换表提供有效地址,以评估来自接收到的GSM分组的GSM操作/数据的EAs是否为 内存映射到本地内存的RA。 HFI检索与接收的GSM分组内的GSM操作/数据相关联的EA。 HFI将EA转发到MMU,该MMU确定EA何时映射到本地内存中的本地任务的RA。 HFI处理逻辑仅当GSM操作的EA / GSM分组内的数据是具有本地RA转换的EA时才能处理GSM分组。 不匹配的EA会导致需要调试的错误条件。

    Host Fabric Interface (HFI) to Perform Global Shared Memory (GSM) Operations
    7.
    发明申请
    Host Fabric Interface (HFI) to Perform Global Shared Memory (GSM) Operations 失效
    主机结构接口(HFI)执行全局共享内存(GSM)操作

    公开(公告)号:US20090198918A1

    公开(公告)日:2009-08-06

    申请号:US12024397

    申请日:2008-02-01

    IPC分类号: G06F12/02

    CPC分类号: G06F12/109 G06F9/544

    摘要: A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.

    摘要翻译: 数据处理系统通过物理内存的分布式EA-to-RA映射实现跨多个节点的全局共享存储(GSM)操作。 每个节点都有一个主机结构接口(HFI),它包括分配给并行作业最多一个本地执行任务的HFI窗口。 任务执行并行作业执行,但将全局地址空间的有效地址(EA)的一部分映射到任务相应节点的本地实际存储器。 HFI窗口使用作业ID对所有传出的GSM操作(本地任务)进行标记,并嵌入EA被映射到的节点的目标节点和HFI窗口ID。 HFI窗口还能够利用归属于接收节点的本地实际存储器的有效EA来处理接收的GSM操作,同时防止在没有有效的EA到RA本地映射的情况下处理其他接收到的操作。

    Host fabric interface (HFI) to perform global shared memory (GSM) operations
    8.
    发明授权
    Host fabric interface (HFI) to perform global shared memory (GSM) operations 失效
    主机结构接口(HFI)执行全局共享内存(GSM)操作

    公开(公告)号:US08484307B2

    公开(公告)日:2013-07-09

    申请号:US12024397

    申请日:2008-02-01

    CPC分类号: G06F12/109 G06F9/544

    摘要: A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.

    摘要翻译: 数据处理系统通过物理内存的分布式EA-to-RA映射实现跨多个节点的全局共享存储(GSM)操作。 每个节点都有一个主机结构接口(HFI),它包括分配给并行作业最多一个本地执行任务的HFI窗口。 任务执行并行作业执行,但将全局地址空间的有效地址(EA)的一部分映射到任务相应节点的本地实际存储器。 HFI窗口使用作业ID对所有传出的GSM操作(本地任务)进行标记,并嵌入EA被映射到的节点的目标节点和HFI窗口ID。 HFI窗口还能够利用归属于接收节点的本地实际存储器的有效EA来处理接收的GSM操作,同时防止在没有有效的EA到RA本地映射的情况下处理其他接收到的操作。

    Mechanisms to Order Global Shared Memory Operations
    9.
    发明申请
    Mechanisms to Order Global Shared Memory Operations 有权
    订购全局共享内存操作的机制

    公开(公告)号:US20090199200A1

    公开(公告)日:2009-08-06

    申请号:US12024367

    申请日:2008-02-01

    IPC分类号: G06F9/50

    摘要: A method and data processing system for performing fence operations within a global shared memory (GSM) environment having a local task executing on a processor and providing GSM commands for processing by a host fabric interface (HFI) window that is allocated to the task. The HFI window has one or more registers for use during local fence operations. A first register tracks a first count of task-issued GSM commands, and a second register tracks a second count of GSM operations being processed by the HFI. The processing logic detects a locally-issued fence operation, and responds by performing a series of operations, including: automatically stopping the task from issuing additional GSM commands; monitoring for completion of all the task-issued GSM commands at the HFI; and triggering a resumption of issuance of GSM commands by the task when the completion of all previous task-issued GSM commands is registered by the HFI.

    摘要翻译: 一种用于在全局共享存储器(GSM)环境内执行栅栏操作的方法和数据处理系统,其具有在处理器上执行的本地任务并提供用于由分配给该任务的主机结构接口(HFI)窗口进行处理的GSM命令。 HFI窗口有一个或多个寄存器用于本地栅栏操作。 第一寄存器跟踪任务发出的GSM命令的第一计数,第二寄存器跟踪由HFI正在处理的GSM操作的第二计数。 处理逻辑检测本地发出的围栏操作,并通过执行一系列操作进行响应,包括:自动停止任务发出附加的GSM命令; 监测在HFI完成所有任务发布的GSM命令; 并且当HFI注册所有先前任务发出的GSM命令的完成时,通过任务触发恢复发出GSM命令。

    Mechanism to Provide Reliability Through Packet Drop Detection
    10.
    发明申请
    Mechanism to Provide Reliability Through Packet Drop Detection 失效
    通过丢包检测提供可靠性的机制

    公开(公告)号:US20090198762A1

    公开(公告)日:2009-08-06

    申请号:US12024600

    申请日:2008-02-01

    IPC分类号: G06F15/16

    CPC分类号: G06F9/544

    摘要: A method and a data processing system for completing checkpoint processing of a distributed job with local tasks communicating with other remote tasks via a host fabric interface (HFI) and assigned HFI window. Each HFI window has a send count and a receive count, which tracks GSM messages that are sent from and received at the HFI window. When a checkpoint is initiated by a master task, each local task forwards the send count and the receive count to the master task. The master task sums the respective counts and then compares the totals to each other. When the send count total is equal to the receive count total, the tasks are permitted to continue processing. However, when the send count total is not equal to the receive count total, the master task notifies each task of the job to rollback to a previous checkpoint or kill the job execution.

    摘要翻译: 一种方法和数据处理系统,用于通过主机结构接口(HFI)和分配的HFI窗口完成与其他远程任务通信的本地任务的分布式作业的检查点处理。 每个HFI窗口都有发送计数和接收计数,用于跟踪在HFI窗口发送和接收的GSM消息。 当主任务启动检查点时,每个本地任务将发送计数和接收计数转发给主任务。 主任务对各个计数进行相加,然后将总计相互比较。 当发送计数总数等于接收计数总数时,允许任务继续处理。 但是,当发送计数总数不等于接收计数总数时,主任务会通知作业的每个任务以回滚到先前的检查点或终止作业执行。