Abstract:
A system and method for semi-automatic workload domain deployment in a computing environment uses a user host selection of at least one host computer for a workload domain to automatically recommend candidate host computers for the workload domain from available host computers using relative and absolute selection criteria. The relative selection criteria include criteria that are based on properties of any manually selected host computers, while the absolute selection criteria include criteria that are not based on properties of any manually selected host computers. Another user selection of at least one of the candidate host computers can then be made for the workload domain. The workload domain is deployed using the user host selections of the at least one hot computer and the at least one of the candidate host computers.
Abstract:
The present disclosure is related to systems and methods for scheduling and managing series of snapshots. An example method can include estimating a transfer time to transfer a first snapshot of a virtual computing instance (VCI) to a first snapshot series, and estimating a transfer time to transfer a second snapshot of the VCI to a second snapshot series. The method can further include determining a first schedule time to start a transfer of a first series of snapshots and determining a second schedule time to start a transfer of a second series of snapshots, wherein the first schedule time and the second schedule time are based at least in part on a respective recovery point objective (RPO). In some embodiments, the method can further include scheduling a point in time to record a next snapshot based at least in part on the shorter schedule time of the first schedule time and the second schedule time.
Abstract:
Exemplary methods, apparatuses, and systems include a hypervisor receiving an error message from an agent within a first virtual machine run by the hypervisor. In response to the error message, the hypervisor determines and initiates a corrective action for the hypervisor to take in response to the error message. An exemplary corrective action includes initiating a reset of the first virtual machine or a reset of a second virtual machine.
Abstract:
In one embodiment, a method for placing virtual machines in a collection is provided. A plurality of equivalence sets of hosts is determined prior to placing virtual machines in the collection. The hosts in an equivalence set of hosts are considered similar. An equivalence set of hosts in the plurality of equivalence sets is selected to place the virtual machines in the collection. The method then places at least a portion of the virtual machines in the collection on one or more hosts in the selected equivalence set of hosts.
Abstract:
Techniques are disclosed for persisting high availability (HA) protection state for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines an identity of a VM that has been powered-on in the host cluster. The host system then persists HA protection state for the VM in a storage object of the VM, where the HA protection state indicates that the VM should be restarted on an active host system in the case of a failure in the host cluster.
Abstract:
A system for proactive resource reservation for protecting virtual machines. The system includes a cluster of hosts, wherein the cluster of hosts includes a master host, a first slave host, and one or more other slave hosts, and wherein the first slave host executes one or more virtual machines thereon. The first slave host is configured to identify a failure that impacts an ability of the one or more virtual machines to provide service, and calculate a list of impacted virtual machines. The master host is configured to receive a request to reserve resources on another host in the cluster of hosts to enable the impacted one or more virtual machines to failover, calculate a resource capacity among the cluster of hosts, determine whether the calculated resource capacity is sufficient to reserve the resources, and send an indication as to whether the resources are reserved.
Abstract:
Recovery of virtual machines when one or more hosts fail includes identifying virtual machines running on the remaining functioning hosts. Some of the identified powered on virtual machines are suspended in favor of restarting some of the failed virtual machines from the failed host(s). A subsequent round of identifying virtual machines for suspension and virtual machines for restarting is performed. Virtual machines for suspension and restarting may be identified based on their associated “recovery time objective” (RTO) values or their “maximum number of RTO violations” value.
Abstract:
A method for adjusting the configuration of host computers in a cluster on which virtual machines are running in response to a failed change in state is disclosed. The method involves receiving at least one reason a change in state failed the present check or the future check, associating the at least one reason with at least one remediation action, wherein the remediation action would allow the change in state to pass both a present check and a future check, assigning the at least one remediation action a cost, and determining a set of remediation actions to perform based on the cost assigned to each remediation action. In an embodiment, the steps of this method may be implemented in a non-transitory computer-readable storage medium having instructions that, when executed in a computing device, causes the computing device to carry out the steps.
Abstract:
Exemplary methods, apparatuses, and systems determine a list of virtual machines to be subject to a corrective action. When one or more of the listed virtual machines have dependencies upon other virtual machines, network connections, or storage devices, the determination of the list includes determining that the dependencies of the one or more virtual machines have been met. An attempt to restart or take another corrective action for the first virtual machine within the list is made. A second virtual machine that is currently deployed and running or powered off or paused in response to the corrective action for the first virtual machine is determined to be dependent upon the first virtual machine. In response to the second virtual machine's dependencies having been met by the attempt to restart or take corrective action for the first virtual machine, the second virtual machine is added to the list of virtual machines.
Abstract:
Disclosed are aspects of proactive high availability that proactively identify and predict hardware failure scenarios and migrate virtual resources to healthy hardware resources. In some aspects, a mapping that maps virtual resources to hardware resources. A plurality of hardware events are identified in association with a hardware resource. A hardware failure scenario is predicted based on a health score of a first hardware resource. A health score is determined based on the hardware events, and a fault model that indicates a level of severity of the hardware events. A particular virtual resource is migrated from the hardware resource to another hardware that has a greater health score.