Abstract:
A deployment approval system receives, from a deployment tool, a deployment request for performing a deployment to a particular resource. The deployment approval system can identify at least one rule for approving or rejecting the deployment request based on one or more criteria. The deployment approval system can determine whether the deployment request satisfies the one or more criteria in the at least one rule to approve or reject the request. If the deployment request is approved, the deployment approval system sends an approval to the deployment tool to perform the deployment. The deployment tool can then perform the deployment and, once the deployment is complete, the deployment approval system can receive a confirmation of the deployment. The deployment approval system can then store data describing the deployment in an audit repository.
Abstract:
Computer systems, such as network computing resources systems, are subject to hardware and software errors. To improve error handling and troubleshooting, information relating to errors is collected from a multitude of computer system and analyzed. As a result of this analysis, troubleshooting errors in computer systems is improved and errors are remediated automatically.
Abstract:
Techniques for are described which enable users of a service provider network to manage local storage devices connected to computer systems of the service provider network as a service. A service provider network provides an application programming interface (API) that enables users to manage local storage devices in association with compute instances created by users using a hardware virtualization service of the service provider network. The API can be used to attach local storage devices to compute instances (that is, make local storage devices available for use as a block storage device), detach local storage devices from compute instances (that is, make local storage devices unavailable for use by a compute instance and possibly available for use by other compute instances on the same computer system), among other possible operations.
Abstract:
Remote computing resource service providers allow customers to execute virtual computer systems in a virtual environment on hardware provided by the computing resource service provider. The hardware may be distributed between various geographic locations connected by a network. The distributed environment may increase latency of various operations of the virtual computer systems executed by the customer. To reduce latency of various operations predictive modeling is used to predict the occurrence of various operations and initiate the operations before they may occur, thereby reducing the amount of latency perceived by the customer.
Abstract:
Methods and apparatus for instance migration to support rapid recovery from correlated failures are described. A failure event affecting one or more compute instances of a provider network, including a particular compute instance hosted at a first instance host, is detected based on an analysis of health status information. A determination is made as to whether a particular compute instance meets an acceptance criterion for a failure-induced migration. The acceptance criterion may be based on storage-related requests from the particular compute instance. If the particular compute instance meets the acceptance criterion, one or more configuration operations are initiated to re-launch the particular compute instance at a different instance host.
Abstract:
A provider network may implement compute instance migrations across availability zones. Compute instances may be located in a particular availability zone of provider network that is implemented across multiple availability zones. A request may be received, from a client of the provider network or other component of the provider network, to migrate a compute instance that is currently operating for a client and located in one availability zone to another availability zone. A destination compute instance may be provisioned in the other availability zone based on a configuration of the currently operating compute instance. In some embodiments, other computing resources utilized by the currently operating compute instance, such as data storage resources, may be moved to the other availability zone. Migration may be completed such that the destination compute instance is currently operating for the client and the compute instance is not.
Abstract:
Techniques for dynamic quarantine of impaired servers are described. A host monitor can obtain first monitoring data associated with a host computing device to at least one fingerprint. A host score associated with the host computing device can be updated based at least on the at least one fingerprint, the score determining a probability of the host computing device being used for a new job. Second monitoring data associated with the host computing device can be obtained following a reduction of load on the host computing device Following reduction in the load on the host, the score can be increased based on the at least one remediation action.
Abstract:
Remote computing resource service providers allow customers to execute virtual computer systems in a virtual environment on hardware provided by the computing resource service provider. The hardware may be distributed between various geographic locations connected by a network. The distributed environment may increase latency of various operations of the virtual computer systems executed by the customer. To reduce latency of various operations predictive modeling is used to predict the occurrence of various operations and initiate the operations before they may occur, thereby reducing the amount of latency perceived by the customer.
Abstract:
Deployment feedback for updates to resources implemented in a private network may be implemented. Feedback codes may be generated and included in deployments sent to a private network for deployment at resources implemented in the private network. One or more of the included feedback codes may be selected based on the performance of the deployment and provided via a feedback communication channel that is disconnected and distinct from the private network. Once received, a current status of the deployment may be determined based on the one or more feedback codes provided via the feedback communication channel.
Abstract:
A ranking service can retrieve metrics from a metrics data store and use the metrics to determine a priority order in which to power down resources in a data center. Metrics from the data store can include a number of instances running on a host, a length of time that an instance has been operational, a type of instance, an amount of CPU use on a host, etc. The ranking service can also obtain other parameters from other sources. The parameters can include whether redundant or failover instances exist, the importance of the instances, whether the customer itself is considered important, other generic parameters from the customer account, a customer provided ranking of instances, etc.