摘要:
Failover processing that accommodates failures of backup computing nodes and resources, such as data storage units and printers. Failure of a computing node that controls resources causes another computing node to assume control of the resources controlled by the failed node. Failure of the primary computing node causes another computing node, at either the same or at a different site, to be selected as the new primary node. Failure of a resource at the primary site causes the site with the next highest priority backup resource to become the new primary site. Failure of a backup computing node causes a new backup node at the same site as the failed backup node to replace the failed backup node as host for the site's resources. Backup mirroring data flows are then adjusted to reflect the new functions of the affected nodes.
摘要:
Method and apparatus for validating and ranking of resources that may be switched between a primary system and one or more backup systems at a single site. One embodiment provides a method for ensuring accessibility of one or more disk units by a system, comprising: configuring a disk pool for the system; validating availability of the one or more disk units for the disk pool; verifying that the disk units are at the same site as the system, and selecting one or more valid disk units for the disk pool. The method may further comprise ranking of each disk unit for the disk pool and selecting one or more valid disk units for the disk pool according to ranking.
摘要:
An apparatus, program product and method utilize hidden group membership to facilitate the processing of originator requests to a group in a clustered computer system. With hidden group membership, a requesting originator is temporarily joined to a group in such a manner that the originator is both hidden and provided with limited access rights, e.g., so that some of the messages sent by the members of a group when processing the request are neither sent to nor received by the originator.
摘要:
Members of a primary-backup group in a clustered computer system are organized into subgroups to manage primary and backup resources being managed by the group. Group members are placed into subgroups based upon their access to particular resources, such that a primary subgroup may be defined comprised of members having access to a common primary resource, with one or more backup subgroups defined comprised of members having access to a common backup resource. A join protocol is used to determine to which of a plurality of resources managed by the primary-backup group a joining member has access, and to add the joining member to a subgroup for a resource to which the joining member has access.
摘要:
An apparatus, program product and method utilize subgroup-specific leader members to exchange group data between group members during the handling of a request to organize members into a group in a clustered computer system, e.g., when handling a membership change operation such as a merge or join. Such subgroup leaders may be determined locally within individual subgroup members, and moreover, the subgroup members may locally track the transmission status of group data for the various subgroups. Each subgroup includes one or more members that are known to store group data that is coherent among all subgroup members.
摘要:
A computing node that functions as a member within a computing system group, such as a cluster, that has a status allowing receipt of group messages even though the node is not an active member of the cluster. The node is able to function as a primary member or as a backup member that controls redundant resources to be utilized in case of a failure. The computing node is able to have one of two status values, an “Active” status and an “Ineligible” status. Members that are able to function as a primary member have an “Active” status assigned, and a member that is not configured or otherwise eligible to perform as a primary member is assigned an “Ineligible” status. Members with an Ineligible status receive all group messages and therefore are able to become configured and eligible to become a primary member.
摘要:
An apparatus, program product and method utilize ordered messages in a clustered computer system to defer the execution of a merge protocol in a cluster group until all pending protocols in each partition of a group are handled, typically by ensuring either cancellation or completion of each pending protocol prior to execution of the merge protocol. From the perspective of each group member, the execution of the merge protocol is deferred by inhibiting processing of the merge request by such member until after processing of all earlier-received pending requests has been completed.
摘要:
Members of a primary-backup group in a clustered computer system are organized into subgroups to manage primary and backup resources being managed by the group. Group members are placed into subgroups based upon their access to particular resources, such that a primary subgroup may be defined comprised of members having access to a common primary resource, with one or more backup subgroups defined comprised of members having access to a common backup resource. A join protocol is used to determine to which of a plurality of resources managed by the primary-backup group a joining member has access, and to add the joining member to a subgroup for a resource to which the joining member has access.
摘要:
A clustered computer system includes multiple computer systems (or nodes) on a network that can become members of a group to work on a particular task. Each node includes group state data that represents the status of all members of the group. A group state data update mechanism in each node updates the group state data at acknowledge (ACK) rounds, so that all the group state data in all nodes are synchronized and identical if all members respond properly during the ACK round. Each node also includes a main thread and one or more work threads. The main thread receives messages from other computer systems in the group, and routes messages intended for the work thread to either a response queue or a work queue in the work thread, depending on the type of the message. If the message is a response to a currently-executing task, the message is placed in the response queue. Otherwise, the message is placed in the work queue for processing at a later time.
摘要:
A computing system group processing architecture that facilitates asymmetric processing at different computing nodes within a group or cluster of nodes. Nodes within a group are assigned to subgroups. Each node in a subgroup performs similar processing, but nodes in different subgroups are able to perform different processing for the same group level protocol. All nodes monitor processing completion votes that are cast by all nodes, and node in subgroups that finish processing before other subgroups synchronize to the processing of those other subgroups by casting dummy votes during vote rounds of other subgroups that are still processing their subgroup protocol.