Tolerating failures using concurrency in a cluster
    1.
    发明授权
    Tolerating failures using concurrency in a cluster 有权
    使用集群中的并发容忍容错

    公开(公告)号:US09176833B2

    公开(公告)日:2015-11-03

    申请号:US13939928

    申请日:2013-07-11

    Abstract: A system, and computer program product for tolerating failures using concurrency in a cluster are provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted.

    Abstract translation: 在说明性实施例中提供了一种用于在群集中使用并发性来容忍故障的系统和计算机程序产品。 在为集群中的应用服务的第一计算节点中检测到故障。 从一组动作中选择动作的子集,所述一组动作被配置为将应用的服务从第一计算节点传送到群集中的第二计算节点。 为第一个计算节点设置一个等待时间。 允许第一个计算节点在等待期间继续为应用服务。 在等待期间,与服务于应用的第一计算节点同时,在第二计算节点处执行动作子集。 响应于在等待期间从第一计算节点接收到活动的信号,第二计算节点的并发操作被中止。

    Tolerating failures using concurrency in a cluster
    2.
    发明授权
    Tolerating failures using concurrency in a cluster 有权
    使用集群中的并发容忍容错

    公开(公告)号:US09176834B2

    公开(公告)日:2015-11-03

    申请号:US14031767

    申请日:2013-09-19

    Abstract: A method is provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted.

    Abstract translation: 在说明性实施例中提供了一种方法。 在为集群中的应用服务的第一计算节点中检测到故障。 从一组动作中选择动作的子集,所述一组动作被配置为将应用的服务从第一计算节点传送到群集中的第二计算节点。 为第一个计算节点设置一个等待时间。 允许第一个计算节点在等待期间继续为应用服务。 在等待期间,与服务于应用的第一计算节点同时,在第二计算节点处执行动作子集。 响应于在等待期间从第一计算节点接收到活动的信号,第二计算节点的并发操作被中止。

Patent Agency Ranking