System and method for preventing data corruption in computer system clusters
    1.
    发明授权
    System and method for preventing data corruption in computer system clusters 有权
    防止计算机系统集群中的数据损坏的系统和方法

    公开(公告)号:US07631066B1

    公开(公告)日:2009-12-08

    申请号:US10105771

    申请日:2002-03-25

    IPC分类号: G06F15/173

    摘要: Systems, methods, apparatus and software can make use of coordinator resources and SCSI-3 persistent reservation commands to determine which nodes of a cluster should be ejected from the cluster, thereby preventing them from corrupting data on a shared data resource. Fencing software operating on the cluster nodes monitors the cluster for a cluster partition (split-brain) event. When such an event occurs, software on at least two of the nodes attempts to unregister other nodes from a majority of coordinator resources. The node that succeeds in gaining control of the majority of coordinator resources survives. Nodes failing to gain control of a majority of coordinator resources remove themselves from the cluster. The winning node can also proceed to unregister ejected nodes from shared data resources. These operations can be performed in parallel to decrease failover time. The software can continue to execute on all nodes to prevent additional problems should a node erroneously attempt to reenter the cluster.

    摘要翻译: 系统,方法,设备和软件可以利用协调器资源和SCSI-3永久保留命令来确定群集中的哪些节点应该从群集中弹出,从而防止它们破坏共享数据资源上的数据。 群集节点上运行的击剑软件会监视群集的群集分区(split-brain)事件。 当发生这种事件时,至少两个节点上的软件尝试从大多数协调器资源注销其他节点。 成功地控制大多数协调者资源的节点幸存下来。 无法获得大多数协调者资源控制的节点将自己从群集中移除。 获胜节点还可以继续从共享数据资源注销退出的节点。 这些操作可以并行执行,以减少故障转移时间。 如果节点错误地尝试重新进入集群,软件可以在所有节点上继续执行,以防止其他问题。