摘要:
In a transmission protocol in which a user running an application in an address space in one data processing system wishes to transmit a data packet to another address space in another data processing system by means of direct memory access directly from a sending buffer to a receiving buffer with no copy, a mechanism is provided for minimizing the need for retransmission and for insuring proper entry into the target data processing system address space. In particular, when the first system does not receive an acknowledgment from the receiver, a special data packet with a retransmit flag bit set is sent to the second system. When this system receives the data packet with the retransmit flag bit set the second system responds either by sending a new acknowledgment or by sending a request for retransmission. No transmission back to the first system occurs, however before such a request is made and in fact the receiving system does not send this retransmission request without insuring that its receipt would be appropriate. In particular, the second system, before requesting retransmission, checks to assure that tag association is still valid so that an adapter at the second system is still capable of matching tags in data packet headers with appropriate real address memory locations within address spaces belonging to the second receiving data processing system. In this manner needless retransmission of packets does not occur and retransmission occurs only when receipt of the data packet is appropriate.
摘要:
An efficient mechanism for sending messages without the use of intermediate copies (i.e. without the staging of data) is provided. In particular an interface specification which allows use users of a transport protocol is defined so as to lend itself to efficient implementations. The interface specification is a complete and robust set of user functions usable within systems desiring reliable and efficient zero copy transport protocols. Two methods are provided to accomplish the implementation of an efficient zero copy protocol. The first method is especially useful in systems where the network device has limited capabilities in terms of hardware, message fragmentation and message reassembly. An additional RDRAM memory allows data to reside in an adapter while handshake operations take place between an adapter and a node so as to specify the final destination of the data. The second method takes advantage of network devices with advanced features which are exploited for maximum efficiency.
摘要:
An efficient mechanism for sending messages without the use of intermediate copies (i.e. without the staging of data) is provided. In particular an interface specification which allows use users of a transport protocol is defined so as to lend itself to efficient implementations. The interface specification is a complete and robust set of user functions usable within systems desiring reliable and efficient zero copy transport protocols. Two methods are provided to accomplish the implementation of an efficient zero copy protocol. The first method is especially useful in systems where the network device has limited capabilities in terms of hardware, message fragmentation and message reassembly. An additional RDRAM memory allows data to reside in an adapter while handshake operations take place between an adapter and a node so as to specify the final destination of the data. The second method takes advantage of network devices with advanced features which are exploited for maximum efficiency.
摘要:
Method, apparatus and program product for communicating from a node to a communications device. A Hardware Abstraction Layer (HAL) provides functions which can be called from user space in a node to access the communications device. An instance of HAL is created in the node. Device specific characteristics from the communications device and a pointer pointing to HAL functions for accessing the communications device are obtained by HAL. HAL then opens multiple ports on the communications device using the functions pointed to by the pointer, and messages are sent between the node and the communications device. The messages thus sent are optimized with respect to the communications device as determined by the obtained device specific characteristics. Multiple processes and protocol stacks may be associated with each port in a single instance of HAL. A further embodiment provides that multiple virtual ports may be associated with a port, with a multiple protocol stacks associated with each virtual port. A further embodiment provides that multiple communications devices may be associated with a single instance of HAL.
摘要:
A method, apparatus and program product for detecting a communication event in a distributed parallel data processing system in which a message is sent from an origin to a target. A low-level application programming interface (LAPI) is provided which has an operation for associating a counter with a communication event to be detected. The LAPI increments the counter upon the occurrence of the communication event. The number in the counter is monitored, and when the number increases, the event is detected. A completion counter in the origin is associated with the completion of a message being sent from the origin to the target. When the message is completed, LAPI increments the completion counter such that monitoring the completion counter detects the completion of the message. The completion counter may be used to insure that a first message has been sent from the origin to the target and completed before a second message is sent.
摘要:
A method, apparatus and program product for message communication in a distributed parallel data processing system. A user message is sent from a sender to a receiver. The user message contains user data and a pointer to a header handler routine. The header handler routine includes a first pointer to a target user buffer and a second pointer to a completion routine. When the user message is received, a low level application program interface (LAPI) is informed which invokes the header handler routines which returns the first and second pointers. LAPI then transfers the user data to the user buffer indicated by the header handler routine, and invokes the completion routine indicated by the header handler routine to complete the transfer of the user message to the receiver.
摘要:
Communication between different entities of a computing environment is facilitated by an address mapping capability. Messages are sent between the entities to have desired tasks performed. Instead of providing within the messages the actual non-logical addresses (e.g., virtual, real addresses) used to perform the tasks, logical addresses are provided. The logical addresses are then mapped to the non-logical addresses. Each logical address can map to a plurality of non-logical addresses.
摘要:
Method, system and program storage device are provided for monitoring and ameliorating congestion in a tightly coupled network. Commensurate with sending a packet into the network, a first time stamp is recorded. Upon receipt of an acknowledgment back across the network responsive to sending of the packet, a second time stamp is recorded. The round trip time of the packet is determined and an amount of congestion is estimated using the determined round trip time and a statically predetermined round trip representative of at least one of no network congestion or a known degree of network congestion. The number of flow control tokens for the destination node can be dynamically varied in response to the estimate of the amount of network congestion. If desired, monitoring and estimating of network congestion can be initiated only after identifying the existence of network congestion, for example, represented by a lack of flow control tokens at a sender node for a destination node.
摘要:
In a multi-node computer system, file access by a failed node is limited. Upon receipt on an indication of a node failure, a fencing command is sent to disks in a disk subsystem to which the failed node has access. If the fencing command sent to a disk fails, the fencing command is sent to a server having access to at least one disk in a disk subsystem to which the failed node has access to limit access by the failed node to the disk in the disk subsystem. If the fencing command sent to the server does not result in limiting access by the failed node to all the disks in the disk subsystem, sending the command to another server having access to at least one disk in the disk subsystem to limit access by the failed node to the disks in the disk subsystem. The fencing command may be sent to various servers until access by the failed node to all the disks in the disk subsystem is limited or until the fencing command has been sent to all the servers. The fencing command may be sent one at a time to servers having access to the disks in the disk subsystem, may be sent concurrently to all the servers having access to the disks in the disk subsystem, or may be forwarded from one server to another.
摘要:
A system for flow-control concurrency to prevent excessive packet loss, including at least one transmitter node. Each transmitter node is configured to transmit data. A first flow-control device is coupled to the at least one transmitter node. The first flow-control device is configured to limit the number of concurrent data replies sent by the at least one transmitter node such that the resources on the transmitter node side will not be overrun. At least one receive node is configured to receive data transmitted. The at least one receiver node is coupled to the at least one transmitter node via the communication network. A second flow-control device is coupled to the at least one receiver node. The second flow-control device is configured to limit the number of concurrent data requests received by the at least one receiver node such that the resources on the receiver node side will not be overrun.