Invention Grant
- Patent Title: Distributed computing fault management
- Patent Title (中): 分布式计算故障管理
-
Application No.: US13961720Application Date: 2013-08-07
-
Publication No.: US09274902B1Publication Date: 2016-03-01
- Inventor: Adam Douglas Morley , Barry Bailey Hunter, Jr. , Yijun Lu , Timothy Andrew Rath , Kiran-Kumar Muniswamy-Reddy , Xianglong Huang , Jiandan Zheng
- Applicant: Amazon Technologies, Inc.
- Applicant Address: US NV Reno
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US NV Reno
- Agency: Baker & Hostetler LLP
- Main IPC: G06F11/00
- IPC: G06F11/00 ; G06F11/20 ; G06F11/07

Abstract:
An automated system may be employed to perform detection, analysis and recovery from faults occurring in a distributed computing system. Faults may be recorded in a metadata store for verification and analysis by an automated fault management process. Diagnostic procedures may confirm detected faults. The automated fault management process may perform recovery workflows involving operations such as rebooting faulting devices and excommunicating unrecoverable computing nodes from affected clusters.
Information query