Method and apparatus for providing failure detection and recovery with predetermined replication style for distributed applications in a network
    1.
    发明授权
    Method and apparatus for providing failure detection and recovery with predetermined replication style for distributed applications in a network 失效
    用于以网络中的分布式应用为预定复制风格提供故障检测和恢复的方法和装置

    公开(公告)号:US06266781B1

    公开(公告)日:2001-07-24

    申请号:US09119139

    申请日:1998-07-20

    IPC分类号: G06F1100

    摘要: An application module (A) running on a host computer in a computer network is failure-protected with one or more backup copies that are operative on other host computers in the network. In order to effect fault protection, the application module registers itself with a ReplicaManager daemon process (112) by sending a registration message, which message, in addition to identifying the registering application module and the host computer on which it is running, includes the particular replication strategy (cold backup, warm backup, or hot backup) and the degree of replication associated with that application module. The backup copies are then maintained in a fail-over state according to the registered replication strategy. A WatchDog daemon (113), running on the same host computer as the registered application periodically monitors the registered application to detect failures. When a failure, such as a crash or hangup of the application module, is detected, the failure is reported to the ReplicaManager, which effects the requested fail-over actions. An additional backup copy is then made operative in accordance with the registered replication style and the registered degree of replication. A SuperWatchDog daemon process (115-1), running on the same host computer as the ReplicaManager, monitors each host computer in the computer network. When a host failure is detected, each application module running on that host computer is individually failure-protected in accordance with its registered replication style and degree of replication.

    摘要翻译: 在计算机网络中的主计算机上运行的应用模块(A)由在网络中的其他主机上运行的一个或多个备份副本进行故障保护。 为了实现故障保护,应用模块通过发送注册消息向ReplicaManager守护进程(112)注册自己,除了识别注册应用模块和运行它的主计算机之外,该消息还包括特定的 复制策略(冷备份,热备份或热备份)以及与该应用模块相关联的复制程度。 然后根据注册的复制策略将备份副本保持在故障切换状态。 与注册应用程序在同一主机上运行的WatchDog守护程序(113)定期监视注册的应用程序以检测故障。 当检测到故障(如应用程序模块的崩溃或挂起)时,会将故障报告给副本管理器,这会影响所请求的故障转移操作。 然后根据注册的复制风格和注册的复制程度使额外的备份副本生效。 与ReplicaManager在同一主机上运行的SuperWatchDog守护进程(115-1)监视计算机网络中的每台主机。 当检测到主机故障时,根据其注册的复制风格和复制程度,在该主机上运行的每个应用程序模块都单独进行故障保护。

    Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network
    2.
    发明授权
    Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network 失效
    用于为网络中的分布式应用提供具有预定复制度的故障检测和恢复的方法和装置

    公开(公告)号:US06195760B1

    公开(公告)日:2001-02-27

    申请号:US09119140

    申请日:1998-07-20

    IPC分类号: G06F1108

    CPC分类号: G06F11/1438 G06F11/0757

    摘要: An application module (A) running on a host computer in a computer network is failure-protected with one or more backup copies that are operative on other host computers in the network. In order to effect fault protection, the application module registers itself with a ReplicaManager daemon process (112) by sending a registration message, which message, in addition to identifying the registering application module and the host computer on which it is running, includes the particular replication strategy (cold backup, warm backup, or hot backup) and the degree of replication associated with that application module. The backup copies are then maintained in a fail-over state according to the registered replication strategy. A WatchDog daemon (113), running on the same host computer as the registered application periodically monitors the registered application to detect failures. When a failure, such as a crash or hangup of the application module, is detected, the failure is reported to the ReplicaManager, which effects the requested fail-over actions. An additional backup copy is then made operative in accordance with the registered replication style and the registered degree of replication. A SuperWatchDog daemon process (115-1), running on the same host computer as the ReplicaManager, monitors each host computer in the computer network. When a host failure is detected, each application module running on that host computer is individually failure-protected in accordance with its registered replication style and degree of replication.

    摘要翻译: 在计算机网络中的主计算机上运行的应用模块(A)由在网络中的其他主机上运行的一个或多个备份副本进行故障保护。 为了实现故障保护,应用模块通过发送注册消息向ReplicaManager守护进程(112)注册自己,除了识别注册应用模块和运行它的主计算机之外,该消息还包括特定的 复制策略(冷备份,热备份或热备份)以及与该应用模块相关联的复制程度。 然后根据注册的复制策略将备份副本保持在故障切换状态。 与注册应用程序在同一主机上运行的WatchDog守护程序(113)定期监视注册的应用程序以检测故障。 当检测到故障(如应用程序模块的崩溃或挂起)时,会将故障报告给副本管理器,这会影响所请求的故障转移操作。 然后根据注册的复制风格和注册的复制程度使额外的备份副本生效。 与ReplicaManager在同一主机上运行的SuperWatchDog守护进程(115-1)监视计算机网络中的每台主机。 当检测到主机故障时,根据其注册的复制风格和复制程度,在该主机上运行的每个应用程序模块都单独进行故障保护。