Abstract:
A method for adjusting the configuration of host computers in a cluster on which virtual machines are running in response to a failed change in state is disclosed. The method involves receiving at least one reason a change in state failed the present check or the future check, associating the at least one reason with at least one remediation action, wherein the remediation action would allow the change in state to pass both a present check and a future check, assigning the at least one remediation action a cost, and determining a set of remediation actions to perform based on the cost assigned to each remediation action. In an embodiment, the steps of this method may be implemented in a non-transitory computer-readable storage medium having instructions that, when executed in a computing device, causes the computing device to carry out the steps.
Abstract:
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.
Abstract:
Exemplary methods, apparatuses, and systems determine a list of virtual machines to be subject to a corrective action. When one or more of the listed virtual machines have dependencies upon other virtual machines, network connections, or storage devices, the determination of the list includes determining that the dependencies of the one or more virtual machines have been met. An attempt to restart or take another corrective action for the first virtual machine within the list is made. A second virtual machine that is currently deployed and running or powered off or paused in response to the corrective action for the first virtual machine is determined to be dependent upon the first virtual machine. In response to the second virtual machine's dependencies having been met by the attempt to restart or take corrective action for the first virtual machine, the second virtual machine is added to the list of virtual machines.
Abstract:
Techniques are disclosed for orchestrating high availability (HA) failover for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for one or more of the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines a VM to be restarted on an active host system in the host cluster. The host system further determines if the VM's persistent data is stored in the object store. If so, the host system adds the VM to a list of VMs to be immediately restarted. Otherwise, the host system checks whether the VM is accessible to the host system by querying a storage layer of the host system configured to manage the object store.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
Exemplary methods, apparatuses, and systems include a hypervisor receiving an error message from an agent within a first virtual machine run by the hypervisor. In response to the error message, the hypervisor determines and initiates a corrective action for the hypervisor to take in response to the error message. An exemplary corrective action includes initiating a reset of the first virtual machine or a reset of a second virtual machine.
Abstract:
A method for supporting a change in state within a cluster of host computers that run virtual machines is disclosed. The method involves identifying a change in state within a cluster of host computers that run virtual machines, determining if predefined criteria for available resources within the cluster of host computers can be met by resources available in the cluster of host computers, and determining if predefined criteria for available resources within the cluster of host computers can be maintained after at least one different predefined change in state. In an embodiment, the steps of this method may be implemented in a non-transitory computer-readable storage medium having instructions that, when executed in a computing device, causes the computing device to carry out the steps.
Abstract:
In one embodiment, a method for placing virtual machines in a collection is provided. A plurality of equivalence sets of hosts is determined prior to placing virtual machines in the collection. The hosts in an equivalence set of hosts are considered similar. An equivalence set of hosts in the plurality of equivalence sets is selected to place the virtual machines in the collection. The method then places at least a portion of the virtual machines in the collection on one or more hosts in the selected equivalence set of hosts.
Abstract:
A system for monitoring virtual machines includes a master host and a slave host. The slave host includes a primary virtual machine and a secondary virtual machine. The slave host is configured to identify a failure that impacts an ability of at least one of the primary virtual machine and the secondary virtual machine to provide service. If the failure is a Permanent Device Loss failure, the slave host is configured to terminate each impacted virtual machine. If the failure is an All Paths Down failure, the master host is configured to apply one of the following: a first remedy if the primary virtual machine is impacted and the secondary virtual machine is not impacted; a second remedy if the secondary virtual machine is impacted and the primary virtual machine is not impacted; or a third remedy if both the primary virtual machine and the secondary virtual machine are impacted.
Abstract:
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.