Fault-Tolerance And Fault-Containment Models For Zoning Clustered Application Silos Into Continuous Availability And High Availability Zones In Clustered Systems During Recovery And Maintenance
    1.
    发明申请
    Fault-Tolerance And Fault-Containment Models For Zoning Clustered Application Silos Into Continuous Availability And High Availability Zones In Clustered Systems During Recovery And Maintenance 失效
    容错和容错模型将分区集群应用程序转换为集群系统中的连续可用性和高可用性区域在恢复和维护期间

    公开(公告)号:US20120166866A1

    公开(公告)日:2012-06-28

    申请号:US13372209

    申请日:2012-02-13

    IPC分类号: G06F11/20

    摘要: A cluster recovery and maintenance technique for a server cluster having plural nodes implementing a server tier in a client-server computing architecture. A first group of N active nodes each run a software stack comprising a cluster management tier and a cluster application tier that actively provides services on behalf of one or more client applications running in a client application tier on the clients. A second group of M spare nodes each run a software stack comprising a cluster management tier and a cluster application tier that does not actively provide client application services. First and second zones in the cluster are determined in response to an active node membership change involving one or more active nodes departing from or being added to the first group as a result of an active node failing or becoming unreachable or as a result of a maintenance operation involving an active node.

    摘要翻译: 用于具有在客户机 - 服务器计算架构中实现服务器层的多个节点的服务器集群的集群恢复和维护技术。 第一组N个活动节点每个运行包括集群管理层和集群应用层的软件堆栈,该集群应用层主动地代表在客户端上的客户端应用层中运行的一个或多个客户端应用程序提供服务。 第二组M个备用节点每个运行一个包含集群管理层和不主动提供客户端应用程序服务的集群应用程序层的软件堆栈。 集群中的第一和第二区域是响应于主动节点隶属关系变化而被确定的,该主动节点隶属关系改变是由于主动节点故障或变得不可达或作为维护的结果而离开或被添加到第一组的一个或多个活动节点 涉及主动节点的操作。

    Storage system and cluster maintenance
    2.
    发明授权
    Storage system and cluster maintenance 有权
    存储系统和集群维护

    公开(公告)号:US07197632B2

    公开(公告)日:2007-03-27

    申请号:US10426994

    申请日:2003-04-29

    IPC分类号: G06F15/177

    摘要: A method and system for maintaining a discovery record and a cluster bootstrap record is provided. The discovery record enables shared storage system discovery and the cluster bootstrap record enables cluster discovery and cooperative cluster startup. The cluster bootstrap record is updated in response to a change in the cluster membership. The update is performed by a cluster leader in the form of a transactionally consistent I/O update to the cluster bootstrap record on disk and a distributed cache update across the cluster (30, 50). The update is aborted (80) in the event of a failure in the cluster leaving the cluster bootstrap record in a consistent state. In the event of a disastrous cluster and/or storage system failure, the discovery record may be recovered (228) from a restored storage system (214) and the cluster bootstrap record may be reset to install a new cluster in the old cluster's place (232).

    摘要翻译: 提供了一种用于维护发现记录和集群引导记录的方法和系统。 发现记录可实现共享存储系统发现,集群引导记录可实现集群发现和协作集群启动。 响应群集成员资格中的更改更新群集引导记录。 更新由集群领导者以磁盘上的集群引导记录的事务一致的I / O更新的形式执行,并且跨集群的分布式高速缓存更新(30,50)执行。 如果集群出现故障,则将更新中止(80),使集群引导记录处于一致状态。 在发生灾难性的集群和/或存储系统故障的情况下,可以从恢复的存储系统(214)恢复发现记录(228),并且可以重置集群引导记录以在旧集群的位置安装新的集群( 232)。

    Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance
    3.
    发明授权
    Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance 失效
    在群集系统中将群集应用程序区域分区为连续可用性和高可用性区域的故障容错和容错模型在恢复和维护期间

    公开(公告)号:US08286026B2

    公开(公告)日:2012-10-09

    申请号:US13372209

    申请日:2012-02-13

    IPC分类号: G06F11/00

    摘要: A cluster recovery and maintenance technique for a server cluster having plural nodes implementing a server tier in a client-server computing architecture. A first group of N active nodes each run a software stack comprising a cluster management tier and a cluster application tier that actively provides services on behalf of one or more client applications running in a client application tier on the clients. A second group of M spare nodes each run a software stack comprising a cluster management tier and a cluster application tier that does not actively provide client application services. First and second zones in the cluster are determined in response to an active node membership change involving one or more active nodes departing from or being added to the first group as a result of an active node failing or becoming unreachable or as a result of a maintenance operation involving an active node.

    摘要翻译: 用于具有在客户机 - 服务器计算架构中实现服务器层的多个节点的服务器集群的集群恢复和维护技术。 第一组N个活动节点每个运行包括集群管理层和集群应用层的软件堆栈,该集群应用层主动地代表在客户端上的客户端应用层中运行的一个或多个客户端应用程序提供服务。 第二组M个备用节点每个运行一个包含集群管理层和不主动提供客户端应用程序服务的集群应用程序层的软件堆栈。 集群中的第一和第二区域是响应于主动节点隶属关系变化而被确定的,该主动节点隶属关系改变是由于主动节点故障或变得不可达或作为维护的结果而离开或被添加到第一组的一个或多个活动节点 涉及主动节点的操作。

    Reliable Fault Resolution In A Cluster
    4.
    发明申请
    Reliable Fault Resolution In A Cluster 有权
    集群中可靠的故障解析

    公开(公告)号:US20100115338A1

    公开(公告)日:2010-05-06

    申请号:US11773707

    申请日:2007-07-05

    IPC分类号: G06F11/07

    摘要: A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss not a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.

    摘要翻译: 一种集群环境中的故障​​定位和解决方法和系统。 集群配置有至少一个多归属节点和至少一个每个网络接口的网关。 心跳信息以预定义的周期性间隔在对等节点和网关之间发送。 在任何节点或网关发生心跳消息丢失的情况下,将为每个网络接口的集群中的每个节点和网关发出ICMP回应。 如果节点丢失不是网络丢失,则响应于ICMP回应而被验证,则发出应用程序级ping,以确定与不存在心跳消息相关联的故障是否为瞬态错误状况或应用软件故障。

    Reliable fault resolution in a cluster
    5.
    发明授权
    Reliable fault resolution in a cluster 有权
    群集中可靠的故障解决方案

    公开(公告)号:US07941690B2

    公开(公告)日:2011-05-10

    申请号:US11773707

    申请日:2007-07-05

    IPC分类号: G06F11/00

    摘要: A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss not a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.

    摘要翻译: 一种集群环境中的故障​​定位和解决方法和系统。 集群配置有至少一个多归属节点和至少一个每个网络接口的网关。 心跳信息以预定义的周期性间隔在对等节点和网关之间发送。 在任何节点或网关发生心跳消息丢失的情况下,将为每个网络接口的集群中的每个节点和网关发出ICMP回应。 如果节点丢失不是网络丢失,则响应于ICMP回应而被验证,则发出应用程序级ping,以确定与不存在心跳消息相关联的故障是否为瞬态错误状况或应用软件故障。

    Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance
    6.
    发明授权
    Fault-tolerance and fault-containment models for zoning clustered application silos into continuous availability and high availability zones in clustered systems during recovery and maintenance 失效
    在群集系统中将群集应用程序区域分区为连续可用性和高可用性区域的故障容错和容错模型在恢复和维护期间

    公开(公告)号:US08195976B2

    公开(公告)日:2012-06-05

    申请号:US11170331

    申请日:2005-06-29

    IPC分类号: G06F11/00

    摘要: A cluster recovery and maintenance technique for use in a server cluster having plural nodes implementing a server tier in a client-server computing architecture. A first group of N active nodes each run a software stack comprising a cluster management tier and a cluster application tier that actively provides services on behalf of client applications running in a client application tier. A second group of M spare nodes each run a software stack comprising a cluster management tier and a cluster application tier that does not actively provide services on behalf of client applications. First and second zones in the cluster are determined in response to an active node membership change involving active nodes departing from or being added to the first group as a result of an active node failing or becoming unreachable or as a result of a maintenance operation involving an active node.

    摘要翻译: 一种集群恢复和维护技术,用于具有多个节点的服务器集群,该节点在客户端 - 服务器计算体系结构中实现服务器层。 第一组N个活动节点每个运行一个包含集群管理层和集群应用层的软件堆栈,该集群应用层主动代表在客户端应用层运行的客户端应用程序提供服务。 第二组M个备用节点每个运行一个包含集群管理层的软件堆栈和不代表客户端应用程序主动提供服务的集群应用程序层。 响应于主动节点隶属关系变化而确定集群中的第一和第二区域,该主动节点隶属度变化是由于主动节点出现故障或变得不可达而导致主动节点离开或被添加到第一组的主动节点,或者由于维护操作的结果, 活动节点。

    Policy-based cluster quorum determination
    7.
    发明授权
    Policy-based cluster quorum determination 有权
    基于策略的群集法定判定

    公开(公告)号:US07870230B2

    公开(公告)日:2011-01-11

    申请号:US11182469

    申请日:2005-07-15

    摘要: A system, method and computer program product for use in a server cluster having plural server nodes implementing a server tier in a client-server computing architecture in order to determine which of two or more partitioned server subgroups has a quorum. A determination is made of relative priorities of the subgroups and a quorum is awarded to the subgroup having a highest relative priority. The relative priorities are determined by policy rules that evaluate comparative server node application state information. The server node application state information may include one or more of client connectivity, application priority, resource connectivity, processing capability, memory availability, and input/output resource availability, etc. The policy rules evaluate the application state information for each subgroup and can assign different weights to different types of application state information. An interface may be provided for receiving policy rules specified by a cluster application.

    摘要翻译: 一种在服务器集群中使用的系统,方法和计算机程序产品,其具有在客户端 - 服务器计算架构中实现服务器层的多个服务器节点,以便确定两个或更多个分区服务器子组中的哪一个具有仲裁。 确定子组的相对优先级,并向具有最高相对优先级的子组授予法定人数。 相对优先级由评估比较服务器节点应用状态信息的策略规则确定。 服务器节点应用状态信息可以包括客户端连接,应用优先级,资源连接性,处理能力,存储器可用性和输入/输出资源可用性等中的一个或多个。策略规则评估每个子组的应用状态信息并且可以分配 不同权重的不同类型的应用状态信息。 可以提供用于接收由集群应用指定的策略规则的接口。

    Reliable fault resolution in a cluster
    8.
    发明授权
    Reliable fault resolution in a cluster 有权
    群集中可靠的故障解决方案

    公开(公告)号:US07284147B2

    公开(公告)日:2007-10-16

    申请号:US10649269

    申请日:2003-08-27

    IPC分类号: G06F11/00

    摘要: A method and system for localizing and resolving a fault in a cluster environment. The cluster is configured with at least one multi-homed node, and at least one gateway for each network interface. Heartbeat messages are sent between peer nodes and the gateway in predefined periodic intervals. In the event of loss of a heartbeat message by any node or gateway, an ICMP echo is issued to each node and gateway in the cluster for each network interface. If neither a node loss nor a network loss is validated in response to the ICMP echo, an application level ping is issued to determine if the fault associated with the absence of the heartbeat message is a transient error condition or an application software fault.

    摘要翻译: 一种集群环境中的故障​​定位和解决方法和系统。 集群配置有至少一个多归属节点和至少一个每个网络接口的网关。 心跳信息以预定义的周期性间隔在对等节点和网关之间发送。 在任何节点或网关发生心跳消息丢失的情况下,将为每个网络接口的集群中的每个节点和网关发出ICMP回应。 如果既不响应ICMP回声来验证节点丢失和网络丢失,则发出应用级别ping,以确定与不存在心跳消息相关联的故障是否为瞬态错误状况或应用软件故障。