Abstract:
A data processing system has an asynchronous memory mover, which includes multiple sets of registers for storing addressing and control parameters utilized to generate one or more asynchronous memory move (AMM) operations. The memory mover detects a receipt of a first set of parameters in a first set of registers from the processor. The processor forwards the parameters after the processor initiates a data move in virtual address space, utilizing a source effective address and a destination effective address. The memory mover responds to receiving the first set of parameters by generating and launching a first asynchronous memory move (AMM) operation. When the memory mover receives a second set of parameters in a second set of registers before the first AMM operation completes, the memory mover generates and launches a second AMM operation concurrently with the first AMM operation if no address conflicts exist.
Abstract:
A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping.
Abstract:
Disclosed are a method of and system for multiple party communications in a processing system including multiple processing subsystems. Each of the processing subsystems includes a central processing unit and one or more network adapters for connecting said each processing subsystem to the other processing subsystems. A multitude of nodes are established or created, and each of these nodes is associated with one of the processing subsystems. A first aspect of the invention involves pipelined communication using RDMA among three nodes, where the first node breaks up a large communication into multiple parts and sends these parts one after the other to the second node using RDMA, and the second node in turn absorbs and forwards each of these parts to a third node before all parts of the communication arrive from the first node.
Abstract:
Load balancing of adapters on a multi-adapter node of a communications environment. A task executing on the node selects an adapter resource unit to be used as its primary port for communications. The selection is based on the task's identifier, and facilitates a balancing of the load among the adapter resource units. Using the task's identifier, an index is generated that is used to select a particular adapter resource unit from a list of adapter resource units assigned to the task. The generation of the index is efficient and predictable.
Abstract:
A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header.
Abstract:
A method and data processing system for tracking global shared memory (GSM) operations to and from a local node configured with a host fabric interface (HFI) coupled to a network fabric. During task/job initialization, the system OS assigns HFI window(s) to handle the GSM packet generation and GSM packet receipt and processing for each local task. HFI processing logic automatically tags each GSM packet generated by the HFI window with a global job identifier (ID) of the job to which the local task is affiliated. The job ID is embedded within each GSM packet placed on the network fabric. On receipt of a GSM packet from the network fabric, the HFI logic retrieves the embedded job ID and compares the embedded job ID with the ID within the HFI window(s). GSM packets are forwarded to an HFI window only when the embedded job ID matches the HFI window's job ID.
Abstract:
A method performed in a data processing system initiates an asynchronous memory move (AMM) operation, whereby a processor performs a move of data in virtual address space from a first effective address to a second effective address and forwards parameters of the AMM operation to asynchronous memory mover logic for completion of the physical movement of data from a first memory location to a second memory location. The processor executes a second operation, which checks a status of the completion of the data move and returns a notification indicating the status. The notification indicates a status, which includes one of: data move in progress; data move totally done; data move partially done; data move cannot be performed; and occurrence of a translation look-aside buffer invalidate entry (TLBIE) operation. The processor initiates one or more actions in response to the notification received.
Abstract:
A data processing system with a processor and memory includes an instruction set architecture (ISA) that provides an asynchronous memory move (AMM) store (ST) instruction. When the processor executes the AMM ST instruction, the processor performs a series of functions, which initiates an asynchronous memory move (AMM) operation. The AMM ST instruction moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) performing a move of the data in virtual address space utilizing a source effective address that is memory mapped to the first memory location and a destination effective address that is memory mapped to the second memory location. When the move is completed in the virtual address space, the AMM operation performs the physical move of the data to the second memory location outside the processor core, without processor involvement.
Abstract:
In a multinode data processing system in which nodes exchange information over a network or through a switch, a structure and mechanism are provided which enables data packets to be sent and received in any order. Normally, if in-order transmission and receipt are required, then transmission over a single path is essential to insure proper reassembly. However, the present mechanism avoids this necessity and permits Remote Direct Memory Access (RDMA) operations to be carried out simultaneously over multiple paths. This provides a data striping mode of operation in which data transfers can be carried out much faster since packets of single or multiple RDMA messages can be portioned and transferred over several paths simultaneously, thus providing the ability to utilize the full system bandwidth that is available.
Abstract:
A communication system with a communications adapter operating in an interrupt mode, the system comprising: a network system with at least one sender and a recipient of a message and a network for communication therebetween; the communications adapter placing data from the message in a receive buffer and generating an interrupt; and a state variable configured to track received messages. A method for increasing bandwidth in an interrupt mode processing protocol comprising: creating a state variable configured to track received messages; incrementing the state variable only if the received message exhibits multiple packets; decrementing the state variable if the received message exhibits multiple packets and completes; and generating an interrupt, with a communications adapter running in an interrupt mode, the communications adapter placing data from received message in a receive buffer.