-
公开(公告)号:US09069737B1
公开(公告)日:2015-06-30
申请号:US13942625
申请日:2013-07-15
Applicant: Amazon Technologies, Inc.
Inventor: Laban Mwangi Kimotho , Jean-Paul Bauer
CPC classification number: G06F11/1484 , G06F11/0709 , G06F11/079 , G06F11/0793
Abstract: Computer systems, such as network computing resources systems, are subject to hardware and software errors. To improve error handling and troubleshooting, information relating to errors is collected from a multitude of computer system and analyzed. As a result of this analysis, troubleshooting errors in computer systems is improved and errors are remediated automatically.
Abstract translation: 诸如网络计算资源系统的计算机系统受硬件和软件错误的影响。 为了改进错误处理和故障排除,有关错误的信息是从多个计算机系统收集并分析的。 作为此分析的结果,改进了计算机系统中的故障排除错误,并自动修复错误。
-
公开(公告)号:US09652326B1
公开(公告)日:2017-05-16
申请号:US14163906
申请日:2014-01-24
Applicant: Amazon Technologies, Inc.
Inventor: Jean-Paul Bauer , Marc Nevin Daya , Jaco Hermanus Gabriel Le Roux , Kevin Robert Scaife , Laban Mwangi Kimotho , Brian Modra , Alan Roy Powell
CPC classification number: G06F11/2025 , G06F11/203 , G06F11/2035 , G06F11/2048 , G06F11/3055
Abstract: Methods and apparatus for instance migration to support rapid recovery from correlated failures are described. A failure event affecting one or more compute instances of a provider network, including a particular compute instance hosted at a first instance host, is detected based on an analysis of health status information. A determination is made as to whether a particular compute instance meets an acceptance criterion for a failure-induced migration. The acceptance criterion may be based on storage-related requests from the particular compute instance. If the particular compute instance meets the acceptance criterion, one or more configuration operations are initiated to re-launch the particular compute instance at a different instance host.
-