-
公开(公告)号:US11573881B1
公开(公告)日:2023-02-07
申请号:US16914114
申请日:2020-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Daniel Massaguer , Ryan Hicke , Katy Humble , Dhiraj Chaudhary
Abstract: Methods, systems, and computer-readable media for role-based failure response training for distributed systems are disclosed. A failure response training system determines a failure mode associated with an architecture for a distributed system comprising a plurality of components. The training system generates a scenario based at least in part on the failure mode. The scenario comprises an initial state of the distributed system which is associated with one or more metrics indicative of a failure. The training system provides, to a plurality of users, data describing the initial state. The training system solicits user input representing modification of a configuration of the components. The training system determines a modified state of the distributed system based at least in part on the input. The performance of the distributed system in the modified state is indicated by one or more modified metrics differing from the one or more initial metrics.