Reducing application downtime in a cluster using user-defined rules for proactive failover
    1.
    发明授权
    Reducing application downtime in a cluster using user-defined rules for proactive failover 有权
    使用用户定义的主动故障转移规则减少群集中的应用程序停机时间

    公开(公告)号:US07321992B1

    公开(公告)日:2008-01-22

    申请号:US10401478

    申请日:2003-03-28

    IPC分类号: G06F11/00

    摘要: An embodiment of the invention is a method for proactive failover using user-defined rules. An event log of a first server node is monitored to check for user-specified application events. One of the user-specified application events corresponding to an impending failure in an application running on a first server node is detected. In automatic response to the detected impending failure, a proactive failover process is executed to transfer the application to a second server node for continued execution, the second server node being connected to the first server node in a cluster.

    摘要翻译: 本发明的一个实施例是使用用户定义规则的主动故障切换的方法。 监视第一个服务器节点的事件日志以检查用户指定的应用程序事件。 检测与在第一服务器节点上运行的应用中即将发生的故障相对应的用户指定的应用程序事件之一。 在对所检测到的即将发生的故障的自动响应中,执行主动故障切换过程以将应用传送到第二服务器节点以用于继续执行,第二服务器节点连接到集群中的第一服务器节点。

    Proactive method for ensuring availability in a clustered system
    2.
    发明授权
    Proactive method for ensuring availability in a clustered system 有权
    确保集群系统可用性的主动方法

    公开(公告)号:US06986076B1

    公开(公告)日:2006-01-10

    申请号:US10156486

    申请日:2002-05-28

    IPC分类号: G06F11/00

    摘要: The method of the present invention is useful in a computer system including at least two server nodes, each of which can execute clustered server software. The program executes a method for monitoring failure situations to reduce downtime. The method includes the step of detecting an event causing one of the failure situations, and then the method determines if the event affects one of the server nodes. If it is determined the event does affect one of the server nodes, the method then determines if the event exceeds a threshold value. If it is determined the event exceeds a threshold value, the method executes a proactive failover. If the event is not specific to a cluster node, but indicates an impending or actual failure of the cluster software, the method identifies and initiates an appropriate action to fix the condition or provide a workaround (if available) that will preempt an impending failure of the cluster system or would enable a restarting of a failed cluster software.

    摘要翻译: 本发明的方法在包括至少两个可以执行集群服务器软件的服务器节点的计算机系统中是有用的。 该程序执行一种监视故障情况以减少停机时间的方法。 该方法包括检测导致故障情况之一的事件的步骤,然后该方法确定事件是否影响服务器节点之一。 如果确定事件确实影响服务器节点之一,则该方法然后确定事件是否超过阈值。 如果确定事件超过阈值,则该方法将执行主动故障切换。 如果事件不是特定于集群节点,而是指示集群软件即将发生或实际发生故障,则该方法将识别并启动适当的操作来修复该条件或提供解决方法(如果可用),这将抢占即将发生的故障 集群系统或将启用故障集群软件的重新启动。