Data gather/scatter machine
    1.
    发明授权
    Data gather/scatter machine 失效
    数据采集​​/散布机

    公开(公告)号:US06820264B1

    公开(公告)日:2004-11-16

    申请号:US09517167

    申请日:2000-03-02

    IPC分类号: G06F15163

    CPC分类号: G06F9/546

    摘要: An embodiment of the present invention is directed to a method for compiling, storing, and interpreting, as often as needed, a representation of any MPI datatype, including the steps of compiling a tree representation of an MPI datatype into a compact, linear data gather scatter program (DGSP) wherein the DGSP is of a form general enough to encode an arbitrarily complex datatype, registering the compact linear DGSP with a communications subsystem for later interpretation by the subsystem for at least one of sends, receives, packs and unpacks, creating a registered DGSP, and interpreting the registered DGSP. In one embodiment of the present invention, the form of the DGSP uses a single generalized representation. In another embodiment the single generalized representation covers any of the arbitrarily complex datatype patterns that can arise in this context. In yet another embodiment, the single generalized representation provides that any datatype that can be constructed using an application programming interface (API) in MPI can be converted into the form.

    摘要翻译: 本发明的一个实施例涉及用于根据需要频繁地编译,存储和解释任何MPI数据类型的表示的方法,包括将MPI数据类型的树形表示形式编译成紧凑的线性数据集合 散布程序(DGSP),其中DGSP具有足够的形式来编码任意复杂的数据类型,将通信子系统注册紧凑型线性DGSP以供子系统为发送,接收,打包和解包中的至少一个进行解释,创建 注册的DGSP,并解释注册的DGSP。 在本发明的一个实施例中,DGSP的形式使用单个广义表示。 在另一个实施例中,单个广义表示涵盖在该上下文中可能出现的任何任意复杂的数据类型模式。 在另一个实施例中,单一广义表示提供可以使用MPI中的应用编程接口(API)构造的任何数据类型可以被转换成形式。

    Method and apparatus for striping message payload data over a network
    2.
    发明授权
    Method and apparatus for striping message payload data over a network 失效
    用于通过网络分段消息有效载荷数据的方法和装置

    公开(公告)号:US07835359B2

    公开(公告)日:2010-11-16

    申请号:US11298322

    申请日:2005-12-08

    IPC分类号: H04L12/28 H04L12/54

    摘要: A method, an apparatus and a recording medium are provided for communicating message payload data, especially noncontiguous message data, from a first node of a network to a second node of the network in response to a request to transmit a message. Such method includes dividing the length of a data payload to be transmitted into a plurality of submessage payload lengths, i.e., into at least a first submessage payload length and a second submessage payload length. Then, a first ordered submessage is transmitted from the first node for delivery to the second node, the first ordered submessage having the first submessage payload length. A first state of an environment is then determined in the first node as if the step of transmitting the first ordered submessage were already completed. Without having to complete the step of transmitting the first ordered submessage, a second ordered submessage is then transmitted from the first node for delivery to the second node, the second submessage having the second submessage payload length, the second submessage being transmitted in a way that takes into account the first state of the environment in the first node.

    摘要翻译: 提供了一种方法,装置和记录介质,用于响应于发送消息的请求,将消息有效载荷数据,尤其是不连续消息数据从网络的第一节点传送到网络的第二节点。 这种方法包括将待传输的数据有效载荷的长度划分成多个子消息有效载荷长度,即至少是第一子消息有效负载长度和第二子消息有效载荷长度。 然后,从第一节点发送第一有序子消息以传送到第二节点,第一有序子消息具有第一消息有效载荷长度。 然后在第一节点中确定环境的第一状态,就好像传送第一个有序子消息的步骤已经完成。 而不必完成发送第一有序子消息的步骤,然后从第一节点发送第二有序子消息以便传送到第二节点,第二子消息具有第二消息有效负载长度,第二子消息以如下方式发送: 考虑到第一个节点的环境的第一个状态。

    Sharing lock mechanism between protocol layers
    3.
    发明授权
    Sharing lock mechanism between protocol layers 失效
    在协议层之间共享锁机制

    公开(公告)号:US07689992B2

    公开(公告)日:2010-03-30

    申请号:US10877095

    申请日:2004-06-25

    IPC分类号: G06F9/46

    CPC分类号: G06F9/526

    摘要: Shared locks are employed for controlling a thread which extends across more than one protocol layer in a data processing system. The use of a counter is used as part of a data structure which makes it possible to implement shared locks across multiple layers. The use of shared locks avoids the processing overhead usually associated with lock acquisition and release. The thread which is controlled may be initiated in either an upper layer protocol or in a lower layer.

    摘要翻译: 共享锁用于控制在数据处理系统中跨越多于一个协议层延伸的线程。 计数器的使用被用作数据结构的一部分,这使得可以跨多层实现共享锁。 共享锁的使用避免了通常与锁获取和释放相关的处理开销。 被控制的线程可以在上层协议或下层协议中启动。

    Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment
    4.
    发明授权
    Method, system and program product for communicating among processes in a symmetric multi-processing cluster environment 失效
    用于在对称多处理集群环境中进行通信的方法,系统和程序产品

    公开(公告)号:US07958513B2

    公开(公告)日:2011-06-07

    申请号:US11282011

    申请日:2005-11-17

    IPC分类号: G06F9/44

    CPC分类号: G06F9/546

    摘要: A facility is provided for communicating among processes in a symmetric multi-processing (SMP) cluster environment wherein at least some SMP nodes of the SMP cluster include multiple processes. The facility includes transferring intra-nodal at an SMP node messages of a collective communication among processes employing a shared memory of the SMP node; and responsive to the intra-nodal transferring, concurrently transferring inter-nodal multiple messages of the collective communication from n SMP node(s) to m other SMP node(s), wherein at least one of n or m is greater than one. The concurrently transferring is performed by multiple processes of at least one of the n SMP node(s) or the m other SMP node(s). More particularly, the facility includes concurrently transferring inter-nodal the multiple messages from one of: one SMP node to multiple other SMP nodes, multiple SMP nodes to one other SMP node, or multiple SMP nodes to multiple other SMP nodes.

    摘要翻译: 提供用于在对称多处理(SMP)集群环境中的处理之间进行通信的设施,其中SMP集群的至少一些SMP节点包括多个进程。 该设施包括在使用SMP节点的共享存储器的进程之间的SMP节点处传送节点内的集体通信的消息; 并且响应于所述节点间传送,同时将所述集体通信的节间多个消息从n个SMP节点传送到其他SMP节点,其中n或m中的至少一个大于1。 通过n个SMP节点或m个其他SMP节点中的至少一个的多个进程来执行同时传送。 更具体地,该设施包括同时从多个消息中的一个SMP节点到多个其他SMP节点,多个SMP节点到另一个SMP节点或多个SMP节点到多个其他SMP节点之间的多个消息。

    METHOD FOR IMPLEMENTING MPI-2 ONE SIDED COMMUNICATION
    5.
    发明申请
    METHOD FOR IMPLEMENTING MPI-2 ONE SIDED COMMUNICATION 有权
    实施MPI-2一次通信的方法

    公开(公告)号:US20080127203A1

    公开(公告)日:2008-05-29

    申请号:US11467946

    申请日:2006-08-29

    IPC分类号: G06F3/00

    摘要: A method for implementing Message Passing Interface (MPI-2) one-sided communication by using Low-level Applications Programming Interface (LAPI) active messaging capabilities, including providing at least three data transfer types, one of which is used to send a message with a message header greater than one packet where Data Gather and Scatter Programs (DGSP) are placed as part of the message header; allowing a multi-packet header by using a LAPI data transfer type; sending the DGSP and data as one message; reading the DSGP with a header handler; registering the DSGP with the LAPI to allow the LAPI to scatter the data to one or more memory locations; defining two sets of counters, one counter set for keeping track of a state of a prospective communication partner, and another counter set for recording activities of local and Remote Memory Access (RMA) operations; comparing local and remote counts of completed RMA operations to complete synchronization mechanisms; and creating a mpci_wait_loop function.

    摘要翻译: 一种通过使用低级应用程序编程接口(LAPI)活动消息传递功能实现消息传递接口(MPI-2)单向通信的方法,包括提供至少三种数据传输类型,其中一种用于发送消息 一个消息头大于一个数据包,数据收集和散布程序(DGSP)作为消息头的一部分放置; 通过使用LAPI数据传输类型允许多分组报头; 发送DGSP和数据作为一个消息; 用头处理程序读取DSGP; 将DSGP注册到LAPI以允许LAPI将数据分散到一个或多个存储器位置; 定义两组计数器,一个用于跟踪预期通信伙伴的状态的计数器集合,以及用于记录本地和远程存储器访问(RMA)操作的活动的另一个计数器集合; 比较完成的RMA操作的本地和远程计数以完成同步机制; 并创建一个mpci_wait_loop函数。

    EFFICIENT PIPELINING OF RDMA FOR COMMUNICATIONS
    6.
    发明申请
    EFFICIENT PIPELINING OF RDMA FOR COMMUNICATIONS 审中-公开
    RDMA通信的有效管道

    公开(公告)号:US20110078410A1

    公开(公告)日:2011-03-31

    申请号:US11457921

    申请日:2006-07-17

    IPC分类号: G06F12/00 G06F15/76 G06F9/02

    CPC分类号: G06F15/17375

    摘要: Disclosed are a method of and system for multiple party communications in a processing system including multiple processing subsystems. Each of the processing subsystems includes a central processing unit and one or more network adapters for connecting said each processing subsystem to the other processing subsystems. A multitude of nodes are established or created, and each of these nodes is associated with one of the processing subsystems. A first aspect of the invention involves pipelined communication using RDMA among three nodes, where the first node breaks up a large communication into multiple parts and sends these parts one after the other to the second node using RDMA, and the second node in turn absorbs and forwards each of these parts to a third node before all parts of the communication arrive from the first node.

    摘要翻译: 公开了一种包括多个处理子系统的处理系统中的多方通信的方法和系统。 每个处理子系统包括中央处理单元和用于将所述每个处理子系统连接到其他处理子系统的一个或多个网络适配器。 建立或创建多个节点,并且这些节点中的每一个都与处理子系统之一相关联。 本发明的第一方面涉及在三个节点之间使用RDMA的流水线通信,其中第一节点将大型通信分解成多个部分,并且使用RDMA将这些部分一个接一个地发送到第二节点,并且第二节点依次吸收和 在通信的所有部分从第一节点到达之前,将这些部分中的每一个转发到第三节点。

    Method for implementing MPI-2 one sided communication
    7.
    发明授权
    Method for implementing MPI-2 one sided communication 有权
    实现MPI-2单向通信的方法

    公开(公告)号:US07694310B2

    公开(公告)日:2010-04-06

    申请号:US11467946

    申请日:2006-08-29

    摘要: A method for implementing Message Passing Interface (MPI-2) one-sided communication by using Low-level Applications Programming Interface (LAPI) active messaging capabilities, including providing at least three data transfer types, one of which is used to send a message with a message header greater than one packet where Data Gather and Scatter Programs (DGSP) are placed as part of the message header; allowing a multi-packet header by using a LAPI data transfer type; sending the DGSP and data as one message; reading the DSGP with a header handler; registering the DSGP with the LAPI to allow the LAPI to scatter the data to one or more memory locations; defining two sets of counters, one counter set for keeping track of a state of a prospective communication partner, and another counter set for recording activities of local and Remote Memory Access (RMA) operations; comparing local and remote counts of completed RMA operations to complete synchronization mechanisms; and creating a mpci_wait_loop function.

    摘要翻译: 一种通过使用低级应用程序编程接口(LAPI)活动消息传递功能实现消息传递接口(MPI-2)单向通信的方法,包括提供至少三种数据传输类型,其中一种用于发送消息 一个消息头大于一个数据包,数据收集和散布程序(DGSP)作为消息头的一部分放置; 通过使用LAPI数据传输类型允许多分组报头; 发送DGSP和数据作为一个消息; 用头处理程序读取DSGP; 将DSGP注册到LAPI以允许LAPI将数据分散到一个或多个存储器位置; 定义两组计数器,一个用于跟踪预期通信伙伴的状态的计数器集合,以及用于记录本地和远程存储器访问(RMA)操作的活动的另一个计数器集合; 比较完成的RMA操作的本地和远程计数以完成同步机制; 并创建一个mpci_wait_loop函数。

    Facilitating intra-node data transfer in collective communications
    8.
    发明授权
    Facilitating intra-node data transfer in collective communications 失效
    促进集体通信中的节点内数据传输

    公开(公告)号:US07539989B2

    公开(公告)日:2009-05-26

    申请号:US10962721

    申请日:2004-10-12

    IPC分类号: G06F9/46

    CPC分类号: G06F9/544

    摘要: Intra-node data transfer in collective communications is facilitated. A memory object of one task of a collective communication is concurrently attached to the address spaces of a plurality of other tasks of the communication. Those tasks that attach the memory object can access the memory object as if it was their own. Data can be directly written into or read from an application data structure of the memory object by the attaching tasks without copying the data to/from shared memory.

    摘要翻译: 集体通信中的节点间数据传输便利。 集体通信的一个任务的存储对象同时附加到通信的多个其他任务的地址空间。 附加内存对象的任务可以像内存对象那样访问内存对象。 可以通过附加任务将数据直接写入或读取存储器对象的应用数据结构,而不将数据复制到共享存储器。

    Checkpoint/resume/restart safe methods in a data processing system to establish, to restore and to release shared memory regions
    9.
    发明授权
    Checkpoint/resume/restart safe methods in a data processing system to establish, to restore and to release shared memory regions 有权
    检查点/恢复/重启安全方法在数据处理系统中建立,恢复和释放共享内存区域

    公开(公告)号:US07987386B2

    公开(公告)日:2011-07-26

    申请号:US12061424

    申请日:2008-04-02

    IPC分类号: G06F11/00

    摘要: A method is provided in which checkpointing operations are carried out in data processing systems running multiple processes which employ shared memory in a manner which preserves data coherence and integrity but which places no timing restrictions or constraints which require coordination of checkpointing operations. Data structures within local process memory and within shared memory provide the checkpoint operation with application level information concerning shared memory resources specific to at least two processes being checkpointed. Methods are provided for establishing, restoring and releasing shared memory regions that are accessed by multiple cooperating processes.

    摘要翻译: 提供了一种方法,其中在以保持数据一致性和完整性的方式运行多个进程的数据处理系统中执行检查点操作,但是这些操作不需要协调检查点操作的时序限制或限制。 本地进程内存和共享内存内的数据结构提供检查点操作,该应用程序级别信息涉及特定于至少两个被检查点进程的共享内存资源。 提供了用于建立,恢复和释放通过多个协作过程访问的共享存储器区域的方法。

    Data gather scatter—redistribution machine
    10.
    发明授权
    Data gather scatter—redistribution machine 有权
    数据采集​​分散再分配机

    公开(公告)号:US07962451B2

    公开(公告)日:2011-06-14

    申请号:US12128303

    申请日:2008-05-28

    IPC分类号: G06F7/00 G06F17/00

    CPC分类号: G06F9/546

    摘要: A method, system, and computer program product to transfer data between two application data structures by copying a data gather scatter program (DGSP) from an exporting process address space where a first data structure is located, to a location in shared memory visible to an importing process address space; assembling a parameter set identifying the data structure; starting a data gather scatter-redistribution machine (DGS-RM) in an importing process space where a second application data structure is located; passing the first parameter set, the DGSP copy, and a second parameter set identifying a second application data structure and a second DGSP to the DGS-RM; and creating master and worker stack machines. The master stack machine identifies a contiguous chunk of the first data structure. The worker stack machine identifies contiguous chunks of the second data structure representing the same number of bytes as the contiguous chunk of the first data structure and transferring to (from) one or more identified chunks of the second data structure from (to) the single chunk of the first application data structure.

    摘要翻译: 一种方法,系统和计算机程序产品,用于通过从第一数据结构所在的导出过程地址空间复制数据采集分散程序(DGSP)到共享存储器中的位置来在两个应用数据结构之间传送数据, 导入进程地址空间; 组装识别数据结构的参数集; 在第二个应用数据结构所在的导入过程空间中启动数据采集分散重新分发机器(DGS-RM); 将第一参数集,DGSP副本以及标识第二应用数据结构和第二DGSP的第二参数集传递给DGS-RM; 并创建主和工作堆栈机器。 主堆栈机器识别第一数据结构的连续块。 工作者堆栈机器识别表示与第一数据结构的连续块相同数量的字节的第二数据结构的连续块,并且从(到)单个块转移到(从)第二数据结构的一个或多个识别的块 的第一个应用数据结构。