-
公开(公告)号:US11966292B2
公开(公告)日:2024-04-23
申请号:US17804392
申请日:2022-05-27
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Nicholas Hill , Peter J. Mendygral , Kent D. Lee , Benjamin James Keen
IPC: G06F11/00 , G06F11/07 , G06F11/14 , H04L67/1029
CPC classification number: G06F11/1407 , G06F11/0772 , H04L67/1029
Abstract: In some examples, a distributed computer system includes a plurality of computer nodes, where the plurality of computer nodes include respective programs to cooperate to perform a workload. A first computer node includes a communication proxy between the program of the first computer node and a communication library that supports communications between the program of the first computer node and the programs of other computer nodes of the plurality of computer nodes, and a fault management service to monitor a health of the other computer nodes, and in response to a detection of a fault of a second computer node of the plurality of computer nodes, relaunch the communication proxy. The relaunched communication proxy selects, from a plurality of states, a common state to which the programs are to roll back.
-
公开(公告)号:US20230385152A1
公开(公告)日:2023-11-30
申请号:US17804392
申请日:2022-05-27
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Nicholas Hill , Peter J. Mendygral , Kent D. Lee , Benjamin James Keen
IPC: G06F11/14 , G06F11/07 , H04L67/1029 , G06N20/00
CPC classification number: G06F11/1407 , G06F11/0772 , H04L67/1029 , G06N20/00
Abstract: In some examples, a distributed computer system includes a plurality of computer nodes, where the plurality of computer nodes include respective programs to cooperate to perform a workload. A first computer node includes a communication proxy between the program of the first computer node and a communication library that supports communications between the program of the first computer node and the programs of other computer nodes of the plurality of computer nodes, and a fault management service to monitor a health of the other computer nodes, and in response to a detection of a fault of a second computer node of the plurality of computer nodes, relaunch the communication proxy. The relaunched communication proxy selects, from a plurality of states, a common state to which the programs are to roll back.
-