Efficient protocol for retransmit logic in reliable zero copy message transport
    1.
    发明授权
    Efficient protocol for retransmit logic in reliable zero copy message transport 失效
    可靠的零复制消息传输中的重传逻辑的高效协议

    公开(公告)号:US06735620B1

    公开(公告)日:2004-05-11

    申请号:US09619054

    申请日:2000-07-18

    IPC分类号: G06F15167

    CPC分类号: G06F15/17

    摘要: In a transmission protocol in which a user running an application in an address space in one data processing system wishes to transmit a data packet to another address space in another data processing system by means of direct memory access directly from a sending buffer to a receiving buffer with no copy, a mechanism is provided for minimizing the need for retransmission and for insuring proper entry into the target data processing system address space. In particular, when the first system does not receive an acknowledgment from the receiver, a special data packet with a retransmit flag bit set is sent to the second system. When this system receives the data packet with the retransmit flag bit set the second system responds either by sending a new acknowledgment or by sending a request for retransmission. No transmission back to the first system occurs, however before such a request is made and in fact the receiving system does not send this retransmission request without insuring that its receipt would be appropriate. In particular, the second system, before requesting retransmission, checks to assure that tag association is still valid so that an adapter at the second system is still capable of matching tags in data packet headers with appropriate real address memory locations within address spaces belonging to the second receiving data processing system. In this manner needless retransmission of packets does not occur and retransmission occurs only when receipt of the data packet is appropriate.

    摘要翻译: 在一种传输协议中,其中在一个数据处理系统中的地址空间中运行应用的用户希望通过直接从发送缓冲器到接收缓冲器的直接存储器访问将数据分组发送到另一个数据处理系统中的另一个地址空间 没有复制,提供了用于最小化对重传的需要并确保正确进入目标数据处理系统地址空间的机制。 特别地,当第一系统没有从接收机接收到确认时,将具有重传标志位的特殊数据分组发送到第二系统。 当该系统接收到重传标志位设置的数据分组时,第二系统通过发送新的确认或通过发送重传请求来进行响应。 然而,在进行这种请求之前,不会发送回到第一系统,并且实际上接收系统不发送该重传请求而不确保其接收是合适的。 特别地,第二系统在请求重传之前检查以确保标签关联仍然有效,使得第二系统处的适配器仍然能够将属于数据包头部的地址空间内的适当的真实地址存储器位置的数据包头中的标签进行匹配 第二接收数据处理系统。 以这种方式,不会发生分组的不必要的重传,只有当数据分组的接收是适当的时才重发。

    Mechanisms for efficient message passing with copy avoidance in a distributed system using advanced network devices
    2.
    发明授权
    Mechanisms for efficient message passing with copy avoidance in a distributed system using advanced network devices 有权
    使用高级网络设备的分布式系统中有效的消息传递与避免复制的机制

    公开(公告)号:US07089289B1

    公开(公告)日:2006-08-08

    申请号:US09619051

    申请日:2000-07-18

    CPC分类号: G06F13/28

    摘要: An efficient mechanism for sending messages without the use of intermediate copies (i.e. without the staging of data) is provided. In particular an interface specification which allows use users of a transport protocol is defined so as to lend itself to efficient implementations. The interface specification is a complete and robust set of user functions usable within systems desiring reliable and efficient zero copy transport protocols. Two methods are provided to accomplish the implementation of an efficient zero copy protocol. The first method is especially useful in systems where the network device has limited capabilities in terms of hardware, message fragmentation and message reassembly. An additional RDRAM memory allows data to reside in an adapter while handshake operations take place between an adapter and a node so as to specify the final destination of the data. The second method takes advantage of network devices with advanced features which are exploited for maximum efficiency.

    摘要翻译: 提供了一种用于在不使用中间副本(即,不进行数据分段)的情况下发送消息的有效机制。 特别地,允许使用传输协议的用户的接口规范被定义为使其能够有效地实现。 接口规范是一个完整和强大的用户功能集合,可在系统中使用,可靠和高效的零拷贝传输协议。 提供了两种方法来实现有效的零拷贝协议。 第一种方法在系统中特别有用,其中网络设备在硬件,消息分段和消息重组方面具有有限的能力。 另外的RDRAM内存允许数据驻留在适配器中,而握手操作会在适配器和节点之间进行,以便指定数据的最终目的地。 第二种方法利用具有高效能的网络设备,以最大限度地提高效率。

    Mechanisms for efficient message passing with copy avoidance in a distributed system
    3.
    发明授权
    Mechanisms for efficient message passing with copy avoidance in a distributed system 有权
    在分布式系统中有效的消息传递与避免复制的机制

    公开(公告)号:US06799200B1

    公开(公告)日:2004-09-28

    申请号:US09619053

    申请日:2000-07-18

    IPC分类号: G06F15167

    CPC分类号: G06F12/1081

    摘要: An efficient mechanism for sending messages without the use of intermediate copies (i.e. without the staging of data) is provided. In particular an interface specification which allows use users of a transport protocol is defined so as to lend itself to efficient implementations. The interface specification is a complete and robust set of user functions usable within systems desiring reliable and efficient zero copy transport protocols. Two methods are provided to accomplish the implementation of an efficient zero copy protocol. The first method is especially useful in systems where the network device has limited capabilities in terms of hardware, message fragmentation and message reassembly. An additional RDRAM memory allows data to reside in an adapter while handshake operations take place between an adapter and a node so as to specify the final destination of the data. The second method takes advantage of network devices with advanced features which are exploited for maximum efficiency.

    摘要翻译: 提供了一种用于在不使用中间副本(即,不进行数据分段)的情况下发送消息的有效机制。 特别地,允许使用传输协议的用户的接口规范被定义为使其能够有效地实现。 接口规范是一个完整和强大的用户功能集合,可在系统中使用,可靠和高效的零拷贝传输协议。 提供了两种方法来实现有效的零拷贝协议。 第一种方法在系统中特别有用,其中网络设备在硬件,消息分段和消息重组方面具有有限的能力。 另外的RDRAM内存允许数据驻留在适配器中,而握手操作会在适配器和节点之间进行,以便指定数据的最终目的地。 第二种方法利用具有高效能的网络设备,以最大限度地提高效率。

    Hardware interface between a switch adapter and a communications
subsystem in a data processing system
    4.
    发明授权
    Hardware interface between a switch adapter and a communications subsystem in a data processing system 失效
    交换适配器与数据处理系统中的通信子系统之间的硬件接口

    公开(公告)号:US06111894A

    公开(公告)日:2000-08-29

    申请号:US920084

    申请日:1997-08-26

    IPC分类号: H04L29/06 G06F3/00

    摘要: Method, apparatus and program product for communicating from a node to a communications device. A Hardware Abstraction Layer (HAL) provides functions which can be called from user space in a node to access the communications device. An instance of HAL is created in the node. Device specific characteristics from the communications device and a pointer pointing to HAL functions for accessing the communications device are obtained by HAL. HAL then opens multiple ports on the communications device using the functions pointed to by the pointer, and messages are sent between the node and the communications device. The messages thus sent are optimized with respect to the communications device as determined by the obtained device specific characteristics. Multiple processes and protocol stacks may be associated with each port in a single instance of HAL. A further embodiment provides that multiple virtual ports may be associated with a port, with a multiple protocol stacks associated with each virtual port. A further embodiment provides that multiple communications devices may be associated with a single instance of HAL.

    摘要翻译: 用于从节点到通信设备进行通信的方法,装置和程序产品。 硬件抽象层(HAL)提供可从节点中的用户空间调用以访问通信设备的功能。 在节点中创建HAL的一个实例。 来自通信设备的设备特定特征和指向HAL功能的指针用于访问通信设备,由HAL获得。 然后,HAL使用指针指向的功能在通信设备上打开多个端口,并且在节点和通信设备之间发送消息。 如此发送的消息相对于通过所获得的设备特定特性确定的通信设备进行了优化。 多个进程和协议栈可能与HAL的单个实例中的每个端口相关联。 另一实施例提供了多个虚拟端口可以与端口相关联,其中多个协议栈与每个虚拟端口相关联。 另一实施例提供多个通信设备可以与HAL的单个实例相关联。

    Signaling communication events in a computer network
    5.
    发明授权
    Signaling communication events in a computer network 失效
    在计算机网络中进行信令通信事件

    公开(公告)号:US6070189A

    公开(公告)日:2000-05-30

    申请号:US921757

    申请日:1997-08-26

    IPC分类号: G06F9/46 G06F15/173 G06F13/00

    CPC分类号: G06F9/542 G06F15/17375

    摘要: A method, apparatus and program product for detecting a communication event in a distributed parallel data processing system in which a message is sent from an origin to a target. A low-level application programming interface (LAPI) is provided which has an operation for associating a counter with a communication event to be detected. The LAPI increments the counter upon the occurrence of the communication event. The number in the counter is monitored, and when the number increases, the event is detected. A completion counter in the origin is associated with the completion of a message being sent from the origin to the target. When the message is completed, LAPI increments the completion counter such that monitoring the completion counter detects the completion of the message. The completion counter may be used to insure that a first message has been sent from the origin to the target and completed before a second message is sent.

    摘要翻译: 一种用于检测分布式并行数据处理系统中的通信事件的方法,装置和程序产品,其中消息从原点发送到目标。 提供了一种低级应用编程接口(LAPI),其具有将计数器与要检测的通信事件相关联的操作。 LAPI在通信事件发生时增加计数器。 监视计数器中的数字,当数量增加时,检测到事件。 原点的完成计数器与从原点发送到目标的消息的完成相关联。 当消息完成时,LAPI会增加完成计数器,以便监视完成计数器检测到消息的完成。 完成计数器可用于确保第一消息已经从原点发送到目标并且在发送第二消息之前完成。

    Method and apparatus for efficient communications using active messages
    6.
    发明授权
    Method and apparatus for efficient communications using active messages 失效
    用于使用活动消息进行高效通信的方法和装置

    公开(公告)号:US6038604A

    公开(公告)日:2000-03-14

    申请号:US918816

    申请日:1997-08-26

    摘要: A method, apparatus and program product for message communication in a distributed parallel data processing system. A user message is sent from a sender to a receiver. The user message contains user data and a pointer to a header handler routine. The header handler routine includes a first pointer to a target user buffer and a second pointer to a completion routine. When the user message is received, a low level application program interface (LAPI) is informed which invokes the header handler routines which returns the first and second pointers. LAPI then transfers the user data to the user buffer indicated by the header handler routine, and invokes the completion routine indicated by the header handler routine to complete the transfer of the user message to the receiver.

    摘要翻译: 一种用于分布式并行数据处理系统中消息通信的方法,装置和程序产品。 用户消息从发送方发送到接收方。 用户消息包含用户数据和指向头处理程序例程的指针。 报头处理程序例程包括指向目标用户缓冲区的第一指针和指向完成例程的第二指针。 当接收到用户消息时,通知低级应用程序接口(LAPI),调用返回第一和第二指针的报头处理程序例程。 然后,LAPI将用户数据传送到由报头处理程序指示的用户缓冲器,并调用由报头处理程序指示的完成例程,以完成将用户消息传送到接收器。

    Mapping a logical address to a plurality on non-logical addresses
    7.
    发明授权
    Mapping a logical address to a plurality on non-logical addresses 失效
    将逻辑地址映射到多个非逻辑地址

    公开(公告)号:US06782464B2

    公开(公告)日:2004-08-24

    申请号:US09906860

    申请日:2001-07-17

    IPC分类号: G06F1200

    CPC分类号: G06F12/0284

    摘要: Communication between different entities of a computing environment is facilitated by an address mapping capability. Messages are sent between the entities to have desired tasks performed. Instead of providing within the messages the actual non-logical addresses (e.g., virtual, real addresses) used to perform the tasks, logical addresses are provided. The logical addresses are then mapped to the non-logical addresses. Each logical address can map to a plurality of non-logical addresses.

    摘要翻译: 通过地址映射能力促进了计算环境的不同实体之间的通信。 在实体之间发送消息以执行所需的任务。 代替在消息内提供用于执行任务的实际非逻辑地址(例如,虚拟,真实地址),提供逻辑地址。 然后将逻辑地址映射到非逻辑地址。 每个逻辑地址可映射到多个非逻辑地址。

    Congestion monitoring and message flow control in a blocking network
    8.
    发明授权
    Congestion monitoring and message flow control in a blocking network 失效
    阻塞网络中的拥塞监控和消息流控制

    公开(公告)号:US06700876B1

    公开(公告)日:2004-03-02

    申请号:US09354750

    申请日:1999-07-29

    IPC分类号: H04J116

    摘要: Method, system and program storage device are provided for monitoring and ameliorating congestion in a tightly coupled network. Commensurate with sending a packet into the network, a first time stamp is recorded. Upon receipt of an acknowledgment back across the network responsive to sending of the packet, a second time stamp is recorded. The round trip time of the packet is determined and an amount of congestion is estimated using the determined round trip time and a statically predetermined round trip representative of at least one of no network congestion or a known degree of network congestion. The number of flow control tokens for the destination node can be dynamically varied in response to the estimate of the amount of network congestion. If desired, monitoring and estimating of network congestion can be initiated only after identifying the existence of network congestion, for example, represented by a lack of flow control tokens at a sender node for a destination node.

    摘要翻译: 提供方法,系统和程序存储设备用于监视和改善紧耦合网络中的拥塞。 相当于将数据包发送到网络中,记录第一个时间戳。 响应于分组的发送而在通过网络接收到确认后,记录第二时间戳。 确定分组的往返时间,并且使用所确定的往返时间和代表无网络拥塞或已知的网络拥塞程度中的至少一个的静态预定往返行程来估计拥塞量。 可以响应于网络拥塞量的估计来动态地改变目的地节点的流控制令牌的数量。 如果需要,可以仅在识别出网络拥塞的存在之后才能启动网络拥塞的监视和估计,例如,由目的地节点的发送方节点缺乏流量控制令牌来表示。

    METHOD, COMPUTER PROGRAM PRODUCT, AND SYSTEM FOR LIMITING ACCESS BY A FAILED NODE
    9.
    发明申请
    METHOD, COMPUTER PROGRAM PRODUCT, AND SYSTEM FOR LIMITING ACCESS BY A FAILED NODE 有权
    方法,计算机程序产品和系统,用于限制由失败的节点访问

    公开(公告)号:US20080155315A1

    公开(公告)日:2008-06-26

    申请号:US11536008

    申请日:2006-09-28

    IPC分类号: G06F11/00

    CPC分类号: G06F11/004

    摘要: In a multi-node computer system, file access by a failed node is limited. Upon receipt on an indication of a node failure, a fencing command is sent to disks in a disk subsystem to which the failed node has access. If the fencing command sent to a disk fails, the fencing command is sent to a server having access to at least one disk in a disk subsystem to which the failed node has access to limit access by the failed node to the disk in the disk subsystem. If the fencing command sent to the server does not result in limiting access by the failed node to all the disks in the disk subsystem, sending the command to another server having access to at least one disk in the disk subsystem to limit access by the failed node to the disks in the disk subsystem. The fencing command may be sent to various servers until access by the failed node to all the disks in the disk subsystem is limited or until the fencing command has been sent to all the servers. The fencing command may be sent one at a time to servers having access to the disks in the disk subsystem, may be sent concurrently to all the servers having access to the disks in the disk subsystem, or may be forwarded from one server to another.

    摘要翻译: 在多节点计算机系统中,故障节点的文件访问受到限制。 在接收到节点故障的指示时,防护命令被发送到故障节点有权访问的磁盘子系统中的磁盘。 如果发送到磁盘的防护命令发生故障,则将fencing命令发送到访问磁盘子系统中至少一个磁盘的服务器,故障节点可以访问该磁盘子系统,以将故障节点访问限制在磁盘子系统中的磁盘 。 如果发送到服务器的防护命令不会导致将故障节点访问磁盘子系统中的所有磁盘,请将命令发送到具有访问磁盘子系统中至少一个磁盘的另一个服务器,以限制访问失败 节点到磁盘子系统中的磁盘。 可以将防护命令发送到各种服务器,直到故障节点对磁盘子系统中的所有磁盘的访问受到限制,或者直到防护命令已发送到所有服务器。 可以一次将防护命令发送到具有访问磁盘子系统中的磁盘的服务器,可以同时发送到具有访问磁盘子系统中的磁盘的所有服务器,或者可以从一个服务器转发到另一个服务器。

    SYSTEM FOR FINE GRAINED FLOW-CONTROL CONCURRENCY TO PREVENT EXCESSIVE PACKET LOSS
    10.
    发明申请
    SYSTEM FOR FINE GRAINED FLOW-CONTROL CONCURRENCY TO PREVENT EXCESSIVE PACKET LOSS 审中-公开
    精细粒度流量控制系统,以防止过大的包装损失

    公开(公告)号:US20080049617A1

    公开(公告)日:2008-02-28

    申请号:US11466615

    申请日:2006-08-23

    IPC分类号: H04J1/16

    摘要: A system for flow-control concurrency to prevent excessive packet loss, including at least one transmitter node. Each transmitter node is configured to transmit data. A first flow-control device is coupled to the at least one transmitter node. The first flow-control device is configured to limit the number of concurrent data replies sent by the at least one transmitter node such that the resources on the transmitter node side will not be overrun. At least one receive node is configured to receive data transmitted. The at least one receiver node is coupled to the at least one transmitter node via the communication network. A second flow-control device is coupled to the at least one receiver node. The second flow-control device is configured to limit the number of concurrent data requests received by the at least one receiver node such that the resources on the receiver node side will not be overrun.

    摘要翻译: 一种用于流控并发的系统,用于防止过多的分组丢失,包括至少一个发射机节点。 每个发射机节点被配置为发送数据。 第一流量控制装置耦合到所述至少一个发射器节点。 第一流量控制装置被配置为限制由至少一个发射机节点发送的并发数据回复的数量,使得发射机节点侧的资源不会超载。 至少一个接收节点被配置为接收发送的数据。 所述至少一个接收器节点经由所述通信网络耦合到所述至少一个发射机节点。 第二流量控制装置耦合到所述至少一个接收器节点。 第二流量控制装置被配置为限制由至少一个接收器节点接收的并发数据请求的数量,使得接收机节点侧的资源不会超载。