Abstract:
A method of managing virtual resources executing on a hardware platform that employs sensors to monitor the health of hardware resources of the hardware platform, includes filtering sensor data from the hardware platform and combining the sensor data with a fault model for the hardware platform to generate a health score, receiving an inventory that maps the virtual resources to the hardware resources of the hardware platform, receiving resource usage data describing use of the hardware resources of the hardware platform by the virtual resources, and generating resource utilization metrics from the resource usage data. The method includes receiving policy data specifying rules applicable to the inventory, determining a set of recommendations for changes to the inventory based on the health score, the resource usage data, and the policy data, and executing at least one recommendation to implement the changes to the inventory.
Abstract:
A system for monitoring a virtual machine executed on a host. The system includes a processor that receives an indication that a failure caused a storage device to be inaccessible to the virtual machine, the inaccessible storage device impacting an ability of the virtual machine to provide service, and applies a remedy to restore access to the storage device based on a type of the failure.
Abstract:
Exemplary methods, apparatuses, and systems determine a list of virtual machines to be subject to a corrective action. When one or more of the listed virtual machines have dependencies upon other virtual machines, network connections, or storage devices, the determination of the list includes determining that the dependencies of the one or more virtual machines have been met. An attempt to restart or take another corrective action for the first virtual machine within the list is made. A second virtual machine that is currently deployed and running or powered off or paused in response to the corrective action for the first virtual machine is determined to be dependent upon the first virtual machine. In response to the second virtual machine's dependencies having been met by the attempt to restart or take corrective action for the first virtual machine, the second virtual machine is added to the list of virtual machines.
Abstract:
Techniques for enabling virtual machine (VM) recovery on non-shared storage in a single virtual infrastructure management server (VIMS) instance are provided. In one set of embodiments, a VIMS instance can receive an indication that a VM in a first cluster of the VIMS instance has failed, and can determine whether the VM's files were being replicated to a storage component of the VIMS instance at the time of the VM's failure. If the VM's files were being replicated at the time of the failure, the VIMS instance can search for and identify a cluster of the VIMS instance and a host system within the cluster that (1) are compatible with the VM, and (2) have access to the storage component. The VIMS instance can then cause the VM to be restarted on the identified host system of the identified cluster.
Abstract:
The present disclosure is related to systems and methods for protecting virtual computing instances. An example system can include a first virtual computing instance (VCI) deployed on a hypervisor and provisioned with a pool of physical computing resources. The hypervisor and the first VCI can operate according to a first configuration profile. The system can include a fault domain manager (FDM) running on a second VCI that is deployed on the hypervisor and provisioned by the pool of physical computing resources. The FDM can be configured to provide high availability support for the first VCI, and the FDM can operate according to a second configuration profile. The system can further include a hypervisor manager running on the second VCI. The hypervisor manager can be configured to facilitate interaction between the FDM and the hypervisor by translating between the first configuration profile and the second configuration profile.
Abstract:
Techniques for enabling virtual machine (VM) recovery on non-shared storage in a single virtual infrastructure management server (VIMS) instance are provided. In one set of embodiments, a VIMS instance can receive an indication that a VM in a first cluster of the VIMS instance has failed, and can determine whether the VM's files were being replicated to a storage component of the VIMS instance at the time of the VM's failure. If the VM's files were being replicated at the time of the failure, the VIMS instance can search for and identify a cluster of the VIMS instance and a host system within the cluster that (1) are compatible with the VM, and (2) have access to the storage component. The VIMS instance can then cause the VM to be restarted on the identified host system of the identified cluster.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
A system for monitoring a virtual machine executed on a host. The system includes a processor that receives an indication that a failure caused a storage device to be inaccessible to the virtual machine, the inaccessible storage device impacting an ability of the virtual machine to provide service, and applies a remedy to restore access to the storage device based on a type of the failure.
Abstract:
Techniques are disclosed for persisting high availability (HA) protection state for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines an identity of a VM that has been powered-on in the host cluster. The host system then persists HA protection state for the VM in a storage object of the VM, where the HA protection state indicates that the VM should be restarted on an active host system in the case of a failure in the host cluster.
Abstract:
A system for monitoring a virtual machine executed on a host. The system includes a processor that receives an indication that a failure caused a storage device to be inaccessible to the virtual machine, the inaccessible storage device impacting an ability of the virtual machine to provide service, and applies a remedy to restore access to the storage device based on a type of the failure.