摘要:
In a multinode data processing system in which nodes exchange information over a network or through a switch, a structure and mechanism is provided within the realm of Remote Direct Memory Access (RDMA) operations in which DMA operations are present on one side of the transfer but not the other. On the side in which the transfer is not carried out in DMA fashion, transfer processing is carried out under program control; this is in contrast to the transfer on the DMA side which is characteristically carried out in hardware. Usage of these combination processes is useful in programming situations where RDMA is carried out to or from contiguous locations in memory on one side and where memory locations on the other side is noncontiguous. This split mode of transfer is provided both for read and for write operations.
摘要:
In remote direct memory access (RDMA) transfers in a multinode data processing system in which the nodes communicate with one another through communication adapters coupled to a switch or network, there is a need for the system to ensure efficient memory protection mechanisms across jobs. A method is thus desired for addressing virtual memory on local and remote servers that is independent of the process ID on the local and/or remote node. The use of global Translation Control Entry (TCE) tables that are accessed/owned by RDMA jobs and are managed by a device driver in conjunction with a Protocol Virtual Offset (PVO) address format solves this problem.
摘要:
In remote direct memory access transfers in a multinode data processing system in which the nodes communicate with one another through communication adapters coupled to a switch or network, failures in the nodes or in the communication adapters can produce the phenomenon known as trickle traffic, which is data that has been received from the switch or from the network that is stale but which may have all the signatures of a valid packet data. The present invention addresses the trickle traffic problem in two situations: node failure and adapter failure. In the node failure situation randomly generated keys are used to reestablish connections to the adapter while providing a mechanism for the recognition of stale packets. In the adapter failure situation, a round robin context allocation approach is used with adapter state contexts being provided with state information which helps to identify stale packets.
摘要:
In a multinode data processing system in which the nodes communicate with one another via communication adapters over a network or switch, the adapters are provided with a dual register mechanism for tracking microcode task status. Upon the issuance of a disruptive command that requires attention from one of the nodes, the task status maintained in one register is copied to the snapshot register. As tasks within the adapter are completed, both registers are updated, thus providing a mechanism for the nodes to determine that all tasks active at the time of the disruptive command have completed. This means that the nodes now have a mechanism for determining, as soon as possible, that all tasks that are active when a disruptive command occurs have completed, thus allowing the data processing node to perform such operations as releasing system memory that is associated with the disruptive command, thus eliminating temporal overhead that can affect performance.
摘要:
In a multinode data processing system in which nodes exchange information over a network or through a switch, the mechanism which enables out-of-order data transfer via Remote Direct Memory Access (RDMA) also provides a corresponding ability to carry out broadcast operations, multicast operations, third party operations and conditional RDMA operations. In a broadcast operation a source node transfers data packets in RDMA fashion to a plurality of destination nodes. Multicast operation works similarly except that distribution is selective. In third party operations a single central node in a cluster or network manages the transfer of data in RDMA fashion between other nodes or creates a mechanism for allowing a directed distribution of data between nodes. In conditional operation mode the transfer of data is conditioned upon one or more events occurring in either the source node or in the destination node.
摘要:
In a multinode data processing system in which nodes exchange information over a network or through a switch, the mechanism which enables out-of-order data transfer via Remote Direct Memory Access (RDMA) also provides a corresponding ability to carry out broadcast operations, multicast operations, third party operations and conditional RDMA operations. In a broadcast operation a source node transfers data packets in RDMA fashion to a plurality of destination nodes. Multicast operation works similarly except that distribution is selective. In third party operations a single central node in a cluster or network manages the transfer of data in RDMA fashion between other nodes or creates a mechanism for allowing a directed distribution of data between nodes. In conditional operation mode the transfer of data is conditioned upon one or more events occurring in either the source node or in the destination node.
摘要:
In a multinode data processing system in which nodes exchange information over a network or through a switch, the mechanism which enables out-of-order data transfer via Remote Direct Memory Access (RDMA) also provides a corresponding ability to carry out broadcast operations, multicast operations, third party operations and conditional RDMA operations. In a broadcast operation a source node transfers data packets in RDMA fashion to a plurality of destination nodes. Multicast operation works similarly except that distribution is selective. In third party operations a single central node in a cluster or network manages the transfer of data in RDMA fashion between other nodes or creates a mechanism for allowing a directed distribution of data between nodes. In conditional operation mode the transfer of data is conditioned upon one or more events occurring in either the source node or in the destination node.
摘要:
A barrier synchronization register, accessible to the nodes in a distributed data processing system, has portions thereof allotted to threads which are present in multiple groups. The barrier synchronization register portion allotted to a given thread has stored therein, over time, group identifier numbers. In this way the state space of a barrier synchronization register is shared over more than one group of process threads.
摘要:
An addressing model is provided where devices, including I/O devices, are addressed with internet protocol (IP) addresses, which are considered part of the virtual address space. A task, such as an application, may be assigned an effective address range, which corresponds to addresses in the virtual address space. The virtual address space is expanded to include Internet protocol addresses. Thus, the page frame tables are also modified to include entries for IP addresses and additional properties for devices and I/O. Thus, a processing element, such as an I/O adapter or even a printer, for example, may also be addressed using IP addresses without the need for library calls, device drivers, pinning memory, and so forth. This addressing model also provides full virtualization of resources across an IP interconnect, allowing a process to access an I/O device across a network.
摘要:
Disclosed are a method, a system and a computer program product for automatically allocating and de-allocating resources for jobs executed or processed by one or more supercomputer systems. In one or more embodiments, a supercomputing system can process multiple jobs with respective supercomputing resources. A global resource manager can automatically allocate additional resources to a first job and de-allocate resources from a second job. In one or more embodiments, the global resource manager can provide the de-allocated resources to the first job as additional supercomputing resources. In one or more embodiments, the first job can use the additional supercomputing resources to perform data analysis at a higher resolution, and the additional resources can compensate for an amount of time the higher resolution analysis would take using originally allocated supercomputing resources.