摘要:
An apparatus, program product, and method update the cluster infrastructure version used by a group resident in a clustered computer system without requiring a shut down of the group during the update. The cluster infrastructure software in individual nodes in the clustered computer system is updated while the group is maintained in an active state. After the cluster infrastructure software is updated, the group is then notified of the update. In response to the notification, the cluster infrastructure version used by the group is dynamically updated to that of the updated cluster infrastructure software, thus making additional functions supported by the new version of the cluster infrastructure software available for use by all group members.
摘要:
An apparatus, program product and method utilize ordered messages in a clustered computer system to defer the execution of a merge protocol in a cluster group until all pending protocols in each partition of a group are handled, typically by ensuring either cancellation or completion of each pending protocol prior to execution of the merge protocol. From the perspective of each group member, the execution of the merge protocol is deferred by inhibiting processing of the merge request by such member until after processing of all earlier-received pending requests has been completed.
摘要:
A clustered computer system includes multiple computer systems (or nodes) on a network that can become members of a group to work on a particular task. Each node includes group state data that represents the status of all members of the group. A group state data update mechanism in each node updates the group state data at acknowledge (ACK) rounds, so that all the group state data in all nodes are synchronized and identical if all members respond properly during the ACK round. Each node also includes a main thread and one or more work threads. The main thread receives messages from other computer systems in the group, and routes messages intended for the work thread to either a response queue or a work queue in the work thread, depending on the type of the message. If the message is a response to a currently-executing task, the message is placed in the response queue. Otherwise, the message is placed in the work queue for processing at a later time.
摘要:
An apparatus, clustered computer system, and program product rely on cluster-private group names to perform accesses to groups that are resident in a clustered computer system. Thus, for a cluster-accessible group, all nodes capable of participating in a cluster are configured to map to the same cluster-private group name for that group, so that any external user that has access to the clustered computer system can access the group name and utilize the group name to initiate operations by the group.
摘要:
An apparatus, clustered computer system, program product and method rely on cluster-private group names to perform accesses to groups that are resident in a clustered computer system. Thus, for a cluster-accessible group, all nodes capable of participating in a cluster are configured to map to the same cluster-private group name for that group, so that any external user that has access to the clustered computer system can access the group name and utilize the group name to initiate operations by the group.
摘要:
An apparatus, clustered computer system, and program product rely on cluster-private group names to perform accesses to groups that are resident in a clustered computer system. Thus, for a cluster-accessible group, all nodes capable of participating in a cluster are configured to map to the same cluster-private group name for that group, so that any external user that has access to the clustered computer system can access the group name and utilize the group name to initiate operations by the group.
摘要:
An apparatus, program product and method utilize subgroup-specific leader members to exchange group data between group members during the handling of a request to organize members into a group in a clustered computer system, e.g., when handling a membership change operation such as a merge or join. Such subgroup leaders may be determined locally within individual subgroup members, and moreover, the subgroup members may locally track the transmission status of group data for the various subgroups. Each subgroup includes one or more members that are known to store group data that is coherent among all subgroup members.
摘要:
An apparatus, program product and method to synchronize group state data in a primary-backup group in connection with the rejoining of a member to the primary-backup group in a clustered computer system. Each member in the group includes a copy of replicated group state data for the primary-backup group. In connection with rejoining the member, it is determined whether the rejoining member is the primary member for the primary-backup group. Then, a selection is made between member and group overwrite operations based upon such determination. The member overwrite operation includes overwriting the copy of the replicated group state data for the rejoining member with data from the copy of the replicated group state data for an existing member in the primary-backup group. The group overwrite operation includes overwriting the copy of the replicated group state data for the existing member in the primary-backup group with data from the copy of the replicated group state data for the rejoining member.
摘要:
An apparatus and method passively determine when a job in a clustered computing environment is dead. Each node in the cluster has a cluster engine for communicating between each job on the node and jobs on other nodes. A protocol is defined that includes one or more acknowledge (ACK) rounds, and that only performs local processing between ACK rounds. The protocol is executed by jobs that are members of a defined group. Each job in the group has one or more work threads that execute the protocol. In addition, each job has a main thread that communicates between the job and jobs on other nodes (through the cluster engine), routes appropriate messages from the cluster engine to a work thread, and signals to the cluster engine when a fault occurs when the work thread executes the protocol. By assuring that a dead job is reported to other members of the group, liveness information for group members can be monitored without the overhead associated with active liveness checking.
摘要:
A system and method in data processing networks with distributed processing or multiple nodes provide a capability to insure that the protocols implicated in a first or original protocol are identified so that diagnostic messages sent during execution of that protocol are traceable. Thus, each of the diagnostic messages are delivered to the requestor of the original protocol before the original request completes. A linkage is provided between the original protocol and all protocols nested within it.