Abstract:
Disclosed are aspects of proactive high availability that proactively identify and predict hardware failure scenarios and migrate virtual resources to healthy hardware resources. In some aspects, a mapping that maps virtual resources to hardware resources. A plurality of hardware events are identified in association with a hardware resource. A hardware failure scenario is predicted based on a health score of a first hardware resource. A health score is determined based on the hardware events, and a fault model that indicates a level of severity of the hardware events. A particular virtual resource is migrated from the hardware resource to another hardware that has a greater health score.
Abstract:
A number of hosts in a logical cluster is adjusted up or down in an elastic manner by tracking membership of hosts in the cluster using a first data structure and tracking membership of hosts in a spare pool using a second data structure, and upon determining that a triggering condition for adding another host is met and that all hosts in the cluster are being used, selecting a host from the spare pool, and programmatically adding an identifier of the selected host to the first data structure and programmatically deleting the identifier of the selected host from the second data structure.
Abstract:
The subject matter described herein is generally directed towards detection and remediation of virtual computing instance (VCI) failure on host devices. Monitoring is performed to detect suspected failures of different guest operating systems, identify failure information, and perform remediation to provide high availability for the VCI.
Abstract:
A system for monitoring a virtual machine executed on a host. The system includes a processor that receives an indication that a failure caused a storage device to be inaccessible to the virtual machine, the inaccessible storage device impacting an ability of the virtual machine to provide service, and applies a remedy to restore access to the storage device based on a type of the failure.
Abstract:
Processes for managing computing processes within a plurality of data centers configured to provide a cloud computing environment are described. An exemplary process includes executing a process on a first host of a plurality of hosts. When the process is executing on the first host, a first network identifier associated with the plurality of hosts is not a network identifier of a pool of network identifiers associated with the cloud computing environment and first and second route tables respectively corresponding to first and second data centers of the plurality of data centers associate the first network identifier with the first host. The exemplary process further includes detecting an event associated with the process. In response to detecting the event associated with the process, the first and second route tables are respectively updated to associate the first network identifier with a second host of the plurality of hosts.
Abstract:
Methods and devices for providing reserved failover capacity across a plurality of data centers are described herein. An exemplary method includes determining whether a management process is executing at a first data center corresponding to a first physical location. In accordance with a determination that the management process is not executing at the first data center corresponding to the first physical location a host is initiated at a second data center corresponding to a second physical location and the management process is executed on the initiated host at the second data center corresponding to the second physical location.
Abstract:
Techniques are disclosed for reallocating host resources in a virtualized computing environment when certain criteria have been met. In some embodiments, a system identifies a host disabling event. In view of the disabling event, the system identifies a resource for reallocation from a first host to a second host. Based on the identification, the computer system disassociates the identified resource's virtual identifier from the first host device and associates the virtual identifier with the second host device. Thus, the techniques disclosed significantly reduce a system's planned and unplanned downtime.
Abstract:
The subject matter described herein provides virtual computing instance (VCI) component protection against networking failures in a datacenter cluster. Networking routes at the host level, VCI level, and application level are monitored for connectivity. Failures are communicated to a primary host or to a datacenter virtualization infrastructure that initiates policy-based remediation, such as moving affected VCIs to another host in the cluster that has all the necessary networking routes functional.
Abstract:
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.
Abstract:
Methods and devices for providing reserved failover capacity across a plurality of data centers are described herein. An exemplary method includes determining whether a management process is executing at a first data center corresponding to a first physical location. In accordance with a determination that the management process is not executing at the first data center corresponding to the first physical location a host is initiated at a second data center corresponding to a second physical location and the management process is executed on the initiated host at the second data center corresponding to the second physical location.