摘要:
A method is provided in which checkpointing operations are carried out in data processing systems running multiple processes which employ shared memory in a manner which preserves data coherence and integrity but which places no timing restrictions or constraints which require coordination of checkpointing operations. Data structures within local process memory and within shared memory provide the checkpoint operation with application level information concerning shared memory resources specific to at least two processes being checkpointed. Methods are provided for establishing, restoring and releasing shared memory regions that are accessed by multiple cooperating processes.
摘要:
A method is provided in which checkpointing operations are carried out in data processing systems running multiple processes which employ shared memory in a manner which preserves data coherence and integrity but which places no timing restrictions or constraints which require coordination of checkpointing operations. Data structures within local process memory and within shared memory provide the checkpoint operation with application level information concerning shared memory resources specific to at least two processes being checkpointed. Methods are provided for establishing, restoring and releasing shared memory regions that are accessed by multiple cooperating processes.
摘要:
A method is provided in which checkpointing operations are carried out in data processing systems running multiple processes which employ shared memory in a manner which preserves data coherence and integrity but which places no timing restrictions or constraints which require coordination of checkpointing operations. Data structures within local process memory and within shared memory provide the checkpoint operation with application level information concerning shared memory resources specific to at least two processes being checkpointed. Methods are provided for establishing, restoring and releasing shared memory regions that are accessed by multiple cooperating processes.
摘要:
A facility is provided for communicating among processes in a symmetric multi-processing (SMP) cluster environment wherein at least some SMP nodes of the SMP cluster include multiple processes. The facility includes transferring intra-nodal at an SMP node messages of a collective communication among processes employing a shared memory of the SMP node; and responsive to the intra-nodal transferring, concurrently transferring inter-nodal multiple messages of the collective communication from n SMP node(s) to m other SMP node(s), wherein at least one of n or m is greater than one. The concurrently transferring is performed by multiple processes of at least one of the n SMP node(s) or the m other SMP node(s). More particularly, the facility includes concurrently transferring inter-nodal the multiple messages from one of: one SMP node to multiple other SMP nodes, multiple SMP nodes to one other SMP node, or multiple SMP nodes to multiple other SMP nodes.
摘要:
Intra-node data transfer in collective communications is facilitated. A memory object of one task of a collective communication is concurrently attached to the address spaces of a plurality of other tasks of the communication. Those tasks that attach the memory object can access the memory object as if it was their own. Data can be directly written into or read from an application data structure of the memory object by the attaching tasks without copying the data to/from shared memory.
摘要:
Intra-node data transfer in collective communications is facilitated. A memory object of one task of a collective communication is concurrently attached to the address spaces of a plurality of other tasks of the communication. Those tasks that attach the memory object can access the memory object as if it was their own. Data can be directly written into or read from an application data structure of the memory object by the attaching tasks without copying the data to/from shared memory.
摘要:
A method, system, and computer program product to transfer data between two application data structures by copying a data gather scatter program (DGSP) from an exporting process address space where a first data structure is located, to a location in shared memory visible to an importing process address space; assembling a parameter set identifying the data structure; starting a data gather scatter-redistribution machine (DGS-RM) in an importing process space where a second application data structure is located; passing the first parameter set, the DGSP copy, and a second parameter set identifying a second application data structure and a second DGSP to the DGS-RM; and creating master and worker stack machines. The master stack machine identifies a contiguous chunk of the first data structure. The worker stack machine identifies contiguous chunks of the second data structure representing the same number of bytes as the contiguous chunk of the first data structure and transferring to (from) one or more identified chunks of the second data structure from (to) the single chunk of the first application data structure.
摘要:
A method, system, and computer program product to transfer data between two application data structures by copying a data gather scatter program (DGSP) from an exporting process address space where a first data structure is located, to a location in shared memory visible to an importing process address space; assembling a parameter set identifying the data structure; starting a data gather scatter-redistribution machine (DGS-RM) in an importing process space where a second application data structure is located; passing the first parameter set, the DGSP copy, and a second parameter set identifying a second application data structure and a second DGSP to the DGS-RM; and creating master and worker stack machines. The master stack machine identifies a contiguous chunk of the first data structure. The worker stack machine identifies contiguous chunks of the second data structure representing the same number of bytes as the contiguous chunk of the first data structure and transferring to (from) one or more identified chunks of the second data structure from (to) the single chunk of the first application data structure.
摘要:
A method, an apparatus and a recording medium are provided for communicating message payload data, especially noncontiguous message data, from a first node of a network to a second node of the network in response to a request to transmit a message. Such method includes dividing the length of a data payload to be transmitted into a plurality of submessage payload lengths, i.e., into at least a first submessage payload length and a second submessage payload length. Then, a first ordered submessage is transmitted from the first node for delivery to the second node, the first ordered submessage having the first submessage payload length. A first state of an environment is then determined in the first node as if the step of transmitting the first ordered submessage were already completed. Without having to complete the step of transmitting the first ordered submessage, a second ordered submessage is then transmitted from the first node for delivery to the second node, the second submessage having the second submessage payload length, the second submessage being transmitted in a way that takes into account the first state of the environment in the first node.
摘要:
Shared locks are employed for controlling a thread which extends across more than one protocol layer in a data processing system. The use of a counter is used as part of a data structure which makes it possible to implement shared locks across multiple layers. The use of shared locks avoids the processing overhead usually associated with lock acquisition and release. The thread which is controlled may be initiated in either an upper layer protocol or in a lower layer.