Abstract:
A method is provided for transferring data between first and second nodes of a network. Such method includes requesting first data to be transferred by a first upper layer protocol (ULP) operating on the first node of the network; and buffering second data for transfer to the second node by a lower protocol layer lower than the first ULP, the second data including an integral number of standard size units of data including the first data. The method further includes posting the second data to the network for delivery to the second node; receiving the second data at the second node; and from the received data, delivering the first data to a second ULP operating on the second node. The method is of particular application when transferring the data in unit size is faster than transferring the data in other than unit size.
Abstract:
Lock-free queues of a shared memory environment are used to facilitate communication within that environment. The lock-free queues can be used for interprocess communication, as well as intraprocess communication. The lock-free queues are structured to minimize the use of atomic operations when performing operations on the queues, and to minimize the number of enqueue/dequeue operations to be performed on the queues.
Abstract:
A technique for debugging code during runtime includes providing, from an outside process, a trigger to a daemon. In this case, the trigger is associated with a registered callback function. The trigger is then provided, from the daemon, to one or more designated tasks of a job. The registered callback function (that is associated with the trigger) is then executed by the one or more designated tasks. Execution results of the executed registered callback function are then returned (from the one or more designated tasks) to the daemon.
Abstract:
In a parallel computer executing a parallel application, where the parallel computer includes a number of compute nodes, with each compute node including one or more computer processors, the parallel application including a number of processes, and one or more of the processes executing a barrier operation, creating a checkpoint of a parallel application includes: maintaining, by each computer processor, global barrier operation state information, the global barrier operation state information includes an aggregation of each process's barrier operation state information; invoking, for each process of the parallel application, a checkpoint handler; saving, by each process's checkpoint handler as part of a checkpoint for the parallel application, the process's barrier operation state information; and exiting, by each process, the checkpoint handler.
Abstract:
A mechanism is provided for collective acceleration unit tree flow control forms a logical tree (sub-network) among those processors and transfers “collective” packets on this tree. The system supports many collective trees, and each collective acceleration unit (CAU) includes resources to support a subset of the trees. Each CAU has limited buffer space, and the connection between two CAUs is not completely reliable. Therefore, to address the challenge of collective packets traversing on the tree without colliding with each other for buffer space and guaranteeing the end-to-end packet delivery, each CAU in the system effectively flow controls the packets, detects packet loss, and retransmits lost packets.
Abstract:
A method performed in a data processing system initiates an asynchronous memory move (AMM) operation, whereby a processor performs a move of data in virtual address space from a first effective address to a second effective address and forwards parameters of the AMM operation to asynchronous memory mover logic for completion of the physical movement of data from a first memory location to a second memory location. The processor executes a second operation, which checks a status of the completion of the data move and returns a notification indicating the status. The notification indicates a status, which includes one of: data move in progress; data move totally done; data move partially done; data move cannot be performed; and occurrence of a translation look-aside buffer invalidate entry (TLBIE) operation. The processor initiates one or more actions in response to the notification received.
Abstract:
A data processing system includes a set of architected registers within which the processor places state and other information to communicate with the asynchronous memory mover in order to initiate and control an AMM operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation in response to receiving a set of parameters within the architected registers, which parameters are associated with an AMM store instruction executed by the processor to initiates a move of data in virtual space before placing the information in the architected registers. The architected registers are processor architected registers, defined on a per thread basis by a compiler, or memory mapped architected registers allocated for communicating with the asynchronous memory mover during a bind and subsequent execution of an application. The architected registers are also utilized to store state information to enable a restore to a point before execution of the AMM operation.
Abstract:
A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header.
Abstract:
A method, system, and computer program product are provided for managing a cache. A region to be stored within the cache is received. The cache includes multiple regions and each of the regions is defined by memory ranges having a starting index and an ending index. The region that has been received is stored in the cache in accordance with a cache invariant. The cache invariant guarantees that at any given point in time the regions in the cache are stored in a given order and none of the regions are completely contained within any other of the regions.
Abstract:
A message flow controller limits a process from passing a new message in a reliable message passing layer from a source node to at least one destination node while a total number of in-flight messages for the process meets a first level limit. The message flow controller limits the new message from passing from the source node to a particular destination node from among a plurality of destination nodes while a total number of in-flight messages to the particular destination node meets a second level limit. Responsive to the total number of in-flight messages to the particular destination node not meeting the second level limit, the message flow controller only sends a new packet from among at least one packet for the new message to the particular destination node while a total number of in-flight packets for the new message is less than a third level limit.