Method and apparatus for providing process pair protection for complex applications
    3.
    发明授权
    Method and apparatus for providing process pair protection for complex applications 失效
    为复杂应用提供过程对保护的方法和设备

    公开(公告)号:US06477663B1

    公开(公告)日:2002-11-05

    申请号:US09287329

    申请日:1999-04-07

    IPC分类号: G06F1100

    摘要: A method and apparatus for providing process-pair protection to complex applications is provided. The apparatus of the present invention includes a process-pair manager or PPM. The PPM is replicated so that a respective PPM is deployed on each of two computer systems. Each computer system also hosts a watchdog process that monitors and restarts the PPM in case of PPM failures. Each PPM communicates with a respective instance of an application. The application instances may include one or more processes along with associated resources. During normal operation the primary application provides service and periodically checkpoints its state to the backup application. The backup application functions in a standby mode. The two PPMs communicate with each other and exchange messages as state changes occur. The apparatus also includes in each computer system a node watcher that is the PPM of failures of the remote computer system. This way, each monitor the state of the other application instance and the health of the computer system on which it is resident. If a failure of the primary application or of the computer system where it runs is detected, the PPM managing the backup application takes steps to cause its instance of the application to become primary. The failover operation is faster (between 5 and 20 seconds) than corresponding operations provided by other existing methods (between one and 40 minutes depending on the application initialization time) because the backup application does not need to be started and initialized to become primary. The failover is stateful because the backup application receives periodic updates of the state of the primary application.

    摘要翻译: 提供了一种用于向复杂应用提供过程对保护的方法和装置。 本发明的装置包括一个过程对管理器或PPM。 PPM被复制,使得相应的PPM被部署在两个计算机系统中的每一个上。 每个计算机系统还托管一个看门狗进程,以在PPM故障的情况下监视和重新启动PPM。 每个PPM与应用的相应实例进行通信。 应用实例可以包括一个或多个进程以及相关联的资源。 在正常运行期间,主应用程序提供服务并定期检查其状态到备份应用程序。 备份应用程序在待机模式下运行。 两个PPM相互通信,并在发生状态更改时交换消息。 该装置还在每个计算机系统中包括作为远程计算机系统的故障的PPM的节点观察者。 这样,每个监视其他应用程序实例的状态以及其驻留的计算机系统的运行状况。 如果检测到主应用程序或其运行的计算机系统的故障,则管理备份应用程序的PPM将采取措施使其应用程序的实例成为主要应用程序。 故障转移操作比其他现有方法提供的相应操作更快(5到20秒)(取决于应用程序初始化时间在一到40分钟之间),因为备份应用程序不需要被启动并初始化为主。 故障转移是有状态的,因为备份应用程序会收到主应用程序状态的定期更新。