Failover mechanisms in RDMA operations
    3.
    发明申请
    Failover mechanisms in RDMA operations 失效
    RDMA操作中的故障切换机制

    公开(公告)号:US20060045005A1

    公开(公告)日:2006-03-02

    申请号:US11017574

    申请日:2004-12-20

    IPC分类号: H04J1/16

    摘要: In remote direct memory access transfers in a multinode data processing system in which the nodes communicate with one another through communication adapters coupled to a switch or network, failures in the nodes or in the communication adapters can produce the phenomenon known as trickle traffic, which is data that has been received from the switch or from the network that is stale but which may have all the signatures of a valid packet data. The present invention addresses the trickle traffic problem in two situations: node failure and adapter failure. In the node failure situation randomly generated keys are used to reestablish connections to the adapter while providing a mechanism for the recognition of stale packets. In the adapter failure situation, a round robin context allocation approach is used with adapter state contexts being provided with state information which helps to identify stale packets. In another approach to handling the adapter failure situation counts are assigned which provide an adapter failure number to the node which will not match a corresponding number in a context field in the adapter, thus enabling the identification of stale packets.

    摘要翻译: 在多节点数据处理系统中的远程直接存储器访问传输中,其中节点通过耦合到交换机或网络的通信适配器彼此通信,节点或通信适配器中的故障可能产生称为流量流量的现象, 已经从交换机接收到的数据或者来自网络的数据已经过时,但是可能具有有效分组数据的所有签名。 本发明解决了两种情况下的流量流量问题:节点故障和适配器故障。 在节点故障情况下,随机生成的密钥用于重新建立与适配器的连接,同时提供用于识别过时数据包的机制。 在适配器故障情况下,使用循环上下文分配方法,适配器状态上下文被提供有状态信息,其有助于识别过时的分组。 在处理适配器故障情况的另一种方法中,分配了向适配器上下文字段中不匹配相应号码的节点提供适配器故障号,从而能够识别过时的数据包。

    RDMA server (OSI) global TCE tables
    4.
    发明申请
    RDMA server (OSI) global TCE tables 有权
    RDMA服务器(OSI)全局TCE表

    公开(公告)号:US20060047771A1

    公开(公告)日:2006-03-02

    申请号:US11017456

    申请日:2004-12-20

    IPC分类号: G06F15/16

    摘要: In remote direct memory access (RDMA) transfers in a multinode data processing system in which the nodes communicate with one another through communication adapters coupled to a switch or network, there is a need for the system to ensure efficient memory protection mechanisms across jobs. A method is thus desired for addressing virtual memory on local and remote servers that is independent of the process ID on the local and/or remote node. The use of global Translation Control Entry (TCE) tables that are accessed/owned by RDMA jobs and are managed by a device driver in conjunction with a Protocol Virtual Offset (PVO) address format solves this problem.

    摘要翻译: 在多节点数据处理系统中的远程直接存储器访问(RDMA)传输中,其中节点通过耦合到交换机或网络的通信适配器彼此通信,所以系统需要确保跨作业的有效的存储器保护机制。 因此,需要一种方法来解决本地和远程服务器上与本地和/或远程节点上的进程ID无关的虚拟内存。 使用由RDMA作业访问/拥有并由设备驱动程序与协议虚拟偏移(PVO)地址格式一起管理的全局翻译控制条目(TCE)表解决了此问题。

    Method and system for efficiently transferring a self-defined non-contiguous message in a one-sided communication model
    5.
    发明申请
    Method and system for efficiently transferring a self-defined non-contiguous message in a one-sided communication model 失效
    用于在单面通信模型中有效传送自定义非连续消息的方法和系统

    公开(公告)号:US20060085518A1

    公开(公告)日:2006-04-20

    申请号:US10965597

    申请日:2004-10-14

    IPC分类号: G06F15/16

    摘要: A method and system for transferring noncontiguous messages group including assembling a set of data into a series of transmission packets, packaging a description of the layout of the transmission packets into description packets and then places each description packet into a local buffer while maintaining a count of the number of description packets, transfers each description packet into a transmit buffer for transmission to at least one receiving node, identifies the data packets, and forwards each data packet to the transmit buffer for transmission to the at least one receiving node. The receiving node receives the transmission packets, identifies each packet as a description packet or data packet, places the description packets in a local buffer for storage until the description is complete, places each description packet into a user data buffer, stores data packets in a local queue until the description is complete, then transfers the data packets to the user buffer.

    摘要翻译: 一种用于传送不连续消息组的方法和系统,包括将一组数据组合成一系列传输分组,将传输分组的布局的描述打包成描述分组,然后将每个描述分组放置到本地缓冲器中,同时保持计数 描述分组的数量将每个描述分组传送到用于发送到至少一个接收节点的发送缓冲器,识别数据分组,并将每个数据分组转发到发送缓冲器以传输到至少一个接收节点。 接收节点接收传输分组,将每个分组标识为描述分组或数据分组,将描述分组置于本地缓冲区中进行存储,直到描述完成,将每个描述分组放入用户数据缓冲区,将数据分组存储在 本地队列直到描述完成,然后将数据包传送到用户缓冲区。

    Sharing lock mechanism between protocol layers
    7.
    发明申请
    Sharing lock mechanism between protocol layers 失效
    在协议层之间共享锁机制

    公开(公告)号:US20050289550A1

    公开(公告)日:2005-12-29

    申请号:US10877095

    申请日:2004-06-25

    IPC分类号: G06F9/46

    CPC分类号: G06F9/526

    摘要: Shared locks are employed for controlling a thread which extends across more than one protocol layer in a data processing system. The use of a counter is used as part of a data structure which makes it possible to implement shared locks across multiple layers. The use of shared locks avoids the processing overhead usually associated with lock acquisition and release. The thread which is controlled may be initiated in either an upper layer protocol or in a lower layer.

    摘要翻译: 共享锁用于控制在数据处理系统中跨越多于一个协议层延伸的线程。 计数器的使用被用作数据结构的一部分,这使得可以跨多层实现共享锁。 共享锁的使用避免了通常与锁获取和释放相关的处理开销。 被控制的线程可以在上层协议或下层协议中启动。

    Early interrupt notification in RDMA and in DMA operations
    8.
    发明申请
    Early interrupt notification in RDMA and in DMA operations 审中-公开
    RDMA和DMA操作中的早期中断通知

    公开(公告)号:US20060045109A1

    公开(公告)日:2006-03-02

    申请号:US11017573

    申请日:2004-12-20

    IPC分类号: H04L12/28

    CPC分类号: H04L67/1097 H04L69/32

    摘要: In a multinode data processing system in which data is transferred, via direct memory access (DMA) or in remote direct memory access (RDMA), from a source node to at least one destination node through communication adapters coupling each node to a network or switch, a method is provided in which interrupt handling is overlapped with data transfer so as to allow interrupt processing overhead to run in parallel at the destination node with the movement of data to provide performance benefits. The method is also applicable to situations involving multiple interrupt levels corresponding to multithreaded handling capabilities.

    摘要翻译: 在通过直接存储器访问(DMA)或远程直接存储器访问(RDMA))传输数据的多节点数据处理系统中,通过将每个节点耦合到网络或交换机的通信适配器,从源节点到至少一个目的地节点 提供了一种方法,其中中断处理与数据传输重叠,以便允许中断处理开销在目的地节点上并行运行,随着数据的移动而提供性能优势。 该方法也适用于涉及多线程处理能力的多个中断级别的情况。

    Lazy deregistration of user virtual machine to adapter protocol virtual offsets

    公开(公告)号:US20060059242A1

    公开(公告)日:2006-03-16

    申请号:US11017570

    申请日:2004-12-20

    IPC分类号: G06F15/16

    CPC分类号: G06F12/1081

    摘要: A method is provided for operating a communications adapter employed in a multinode data processing system in a fashion which enhances the performance of remote direct memory access data transfers. The system is provided with pointers and a table which are employed to determine whether or not an address which has been supplied for the transfer has already been mapped to a real address at the source or destination node. The table is also preferably provided with counters which can be incremented or decremented to enable the use of least recently used mechanisms at the upper level protocol layers to more efficiently control the setting and resetting of table entries.

    Establishing a communicator across multiple processes in a multithreaded computing environment
    10.
    发明授权
    Establishing a communicator across multiple processes in a multithreaded computing environment 失效
    在多线程计算环境中跨多个进程建立通信器

    公开(公告)号:US06782537B1

    公开(公告)日:2004-08-24

    申请号:US09404381

    申请日:1999-09-23

    IPC分类号: G06F300

    摘要: A deterministic, non-deadlocking technique to achieving distributed consensus in a multithreaded multiprocessing computing environment is provided. A communicator is established across multiple processes in the multithreaded computer environment notwithstanding that multiple groups of threads may be simultaneously trying to establish communicators. The technique includes communicating across the multiple processes to establish a candidate identifier for the communicator for a group of participating threads of the multiple processes; and communicating across the multiple processes to check at each participating thread of the multiple processes whether the candidate identifier can be claimed at its process, and if so, claiming the candidate identifier as the new identifier thereby establishing the communicator. As one example, the technique can be implemented via a subroutine call within a message passing interface (MPI) library.

    摘要翻译: 提供了一种在多线程多处理计算环境中实现分布式共识的确定性,非死锁技术。 尽管多组线程可能同时尝试建立通信器,但在多线程计算机环境中跨多个进程建立了通信器。 该技术包括在多个进程之间进行通信,以为多个进程的一组参与线程建立通信器的候选标识符; 并且在多个进程之间进行通信以在多个进程的每个参与线程处检查候选标识符是否可以在其进程中被要求,并且如果是,则将候选标识符声明为新的标识符,从而建立通信器。 作为一个示例,该技术可以通过消息传递接口(MPI)库内的子程序调用来实现。