Handling restart attempts for high availability managed resources
    1.
    发明授权
    Handling restart attempts for high availability managed resources 有权
    处理高可用性管理资源的重新启动尝试

    公开(公告)号:US07496789B2

    公开(公告)日:2009-02-24

    申请号:US11146531

    申请日:2005-06-06

    IPC分类号: G06F11/00

    CPC分类号: G06F11/0793 G06F11/0709

    摘要: Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.

    摘要翻译: 提供了用于管理高可用性(HA)系统中的资源的技术。 这些技术涉及在对资源执行特定类型的补救措施时增加计数,以便反映针对资源执行特定类型的补救措施的次数。 当确定资源处于稳定运行状态时,计数自动减少。 失败后,计数用于确定是否尝试对资源执行特定类型的补救措施。 补救措施的示例包括重新启动资源,以及将资源重定位到群集的另一个节点。 通过使用计数,系统确保错误的资源不会不断“反弹”。 通过在资源变得稳定的情况下减少计数,则不太可能需要手动干预资源的稳定。

    Handling restart attempts for high availability managed resources

    公开(公告)号:US20060277429A1

    公开(公告)日:2006-12-07

    申请号:US11146531

    申请日:2005-06-06

    IPC分类号: G06F11/00

    CPC分类号: G06F11/0793 G06F11/0709

    摘要: Techniques are provided for managing a resource in a High Availability (HA) system. The techniques involve incrementing a count when a particular type of remedial action is performed on a resource, so that the count that reflects how often the particular type of remedial action has been performed for the resource. When it is determined that the resource has been in stable operation, the count is automatically reduced. After a failure, the count is used to determine whether to attempt to perform the particular type of remedial action on the resource. Examples of remedial actions include restarting the resource, and relocating the resource to another node of a cluster. By using the count, the system insures that a faulty resource does not get constantly “bounced”. By reducing the count when a resource has become stable, there is less likelihood that failure of otherwise stable resources will require manual intervention.

    Method to avoid continuous application failovers in a cluster
    3.
    发明申请
    Method to avoid continuous application failovers in a cluster 有权
    避免集群中连续应​​用程序故障转移的方法

    公开(公告)号:US20080244307A1

    公开(公告)日:2008-10-02

    申请号:US11728663

    申请日:2007-03-26

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1482 G06F11/2028

    摘要: A method and mechanism for failing over applications in a clustered computing system is provided. In an embodiment, the methodology is implemented by a high-availability failover mechanism. Upon detecting a failure of an application that is currently designated to be executing on a particular node of the system, the mechanism may attempt to failover the application onto a different node. The mechanism keeps track of a number of nodes on which a failover of the application is attempted. Then, based on one or more factors including the number of nodes on which a failover of the application is attempted, the mechanism may cease to attempt to failover the application onto a node of the system.

    摘要翻译: 提供了一种在集群计算系统中失败应用程序的方法和机制。 在一个实施例中,该方法由高可用性故障切换机制来实现。 当检测到当前被指定为在系统的特定节点上执行的应用程序的故障时,机制可以尝试将应用故障转移到不同的节点上。 该机制跟踪尝试应用程序故障转移的多个节点。 然后,基于一个或多个因素,包括尝试应用程序的故障切换的节点数,该机制可能会停止将应用程序故障转移到系统的节点上。

    Method to avoid continuous application failovers in a cluster
    4.
    发明授权
    Method to avoid continuous application failovers in a cluster 有权
    避免集群中连续应​​用程序故障转移的方法

    公开(公告)号:US07802128B2

    公开(公告)日:2010-09-21

    申请号:US11728663

    申请日:2007-03-26

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1482 G06F11/2028

    摘要: A method and mechanism for failing over applications in a clustered computing system is provided. In an embodiment, the methodology is implemented by a high-availability failover mechanism. Upon detecting a failure of an application that is currently designated to be executing on a particular node of the system, the mechanism may attempt to failover the application onto a different node. The mechanism keeps track of a number of nodes on which a failover of the application is attempted. Then, based on one or more factors including the number of nodes on which a failover of the application is attempted, the mechanism may cease to attempt to failover the application onto a node of the system.

    摘要翻译: 提供了一种在集群计算系统中失败应用程序的方法和机制。 在一个实施例中,该方法由高可用性故障切换机制来实现。 当检测到当前被指定为在系统的特定节点上执行的应用程序的故障时,机制可以尝试将应用故障转移到不同的节点上。 该机制跟踪尝试应用程序故障转移的多个节点。 然后,基于一个或多个因素,包括尝试应用程序的故障切换的节点数,该机制可能会停止将应用程序故障转移到系统的节点上。