Invention Grant
US06782489B2 System and method for detecting process and network failures in a distributed system having multiple independent networks
有权
用于检测具有多个独立网络的分布式系统中的过程和网络故障的系统和方法
- Patent Title: System and method for detecting process and network failures in a distributed system having multiple independent networks
- Patent Title (中): 用于检测具有多个独立网络的分布式系统中的过程和网络故障的系统和方法
-
Application No.: US09833771Application Date: 2001-04-13
-
Publication No.: US06782489B2Publication Date: 2004-08-24
- Inventor: Roger A. Fleming
- Applicant: Roger A. Fleming
- Main IPC: G06F1100
- IPC: G06F1100

Abstract:
The present invention provides a system and method of detecting a process failure and a network failure in a distributed system. The distributed system includes at least two processes, each executing on a host, operable to transmit messages (i.e., heartbeats) to each other on a plurality of networks in the distributed system. A process in the system is operable to execute a network failure algorithm for detecting failure of a network in the system. The process failure algorithm includes calculating a difference in the period of time to receive a heartbeat on a first network from a process and a period of time to receive a heartbeat on a second network from the process. If the difference exceeds a network failure threshold, the second network is suspected of failing. A process in the system is also operable to execute a process failure algorithm. The process failure algorithm includes detecting receipt of a heartbeat from a process on any one of a plurality of networks in the system within a network failure time limit. If a heartbeat is not received on any of the networks, the process is suspected of failing.
Public/Granted literature
Information query