Abstract:
Techniques for decoupling compute and storage resources in a hyper-converged infrastructure (HCI) are provided. In one set of embodiments, a control plane of the HCI deployment can provision a host from a host platform of an infrastructure on which the HCI deployment is implemented and can provision one or more storage volumes from a storage platform of the infrastructure, where the storage platform runs on physical server resources in the infrastructure that are separate from the host platform. The control plane can then cause the one or more storage volumes to be network-attached to the host in a manner that enables a hypervisor of the host to make the one or more storage volumes available, as part of a virtual storage pool, to one or more virtual machines in the HCI deployment for data storage.
Abstract:
Examples disclosed herein relate to propagating changes made on a file system volume of a primary cluster of nodes to the same file system volume also being managed by a secondary cluster of nodes. An application is executed on both clusters, and data changes on the primary cluster are mirrored to the secondary cluster using an exo-clone file. The exo-clone file includes the differences between two or more snapshots of the volume on the primary cluster, along with identifiers of the change blocks and (optionally) state information thereof. Just these changes, identifiers, and state information are packaged in the exo-clone file and then exported to the secondary cluster, which in turn makes the changes to its version of the volume. Exporting just the changes to the data blocks and the corresponding block identifiers drastically reduces the information needed to be exchanged and processed to keep the two volumes consistent.
Abstract:
Embodiments of the disclosure provide techniques for updating a distributed transaction log on a previously offline resource object component using distributed transaction logs from active host computer nodes from separate RAID mirror configurations. Each component object maintains a journal (log) where distributed transactions are recorded. If a component object goes offline and subsequently returns (e.g., if the node hosting the component object reboots), the component object is marked as stale. To return the component object to an active state, a distributed resources module retrieves the journals from other resource component objects from other RAID configurations where the data is mirrored. The module filters corresponding data that is missing in the journal of the previously offline corresponding object and merges the filtered data to the journal.
Abstract:
A method and system are disclosed for storing client data objects in a deduplicated storage system. Deduplicated data may be stored in a plurality of physical data blocks. A content map layer can provide a mapping between the physical data blocks to logical map objects associated with the client data objects. The deduplicated data may be mapped to logical data blocks that comprise the client data objects.
Abstract:
In a storage cluster having nodes, blocks of a logical storage space of a storage object are allocated flexibly by a parent node to component nodes that are backed by physical storage. The method includes maintaining a first allocation map for the parent node, and second and third allocation maps for the first and second component nodes, respectively, executing a first write operation on the first component node and updating the second allocation map to indicate that the first block is a written block, selecting the second component node for executing a second write operation, and executing the second write operation on the second component node. Upon execution of the second write operation, the third allocation map is updated to indicate that the second block is a written block and the first allocation map is updated to indicate that the second block is allocated to the second component node.
Abstract:
Embodiments of the disclosure provide techniques for measuring congestion and controlling quality of service to a shared resource. A module that interfaces with the shared resource monitors the usage of the shared resource by accessing clients. Upon detecting that the rate of usage of the shared resource has exceeded a maximum rate supported by the shared resource, the module determines and transmits a congestion metric to clients that are currently attempting to access the shared resource. Clients, in turn determine a delay period based on the congestion metric prior to attempting another access of the shared resource.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
Techniques are disclosed for persisting high availability (HA) protection state for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines an identity of a VM that has been powered-on in the host cluster. The host system then persists HA protection state for the VM in a storage object of the VM, where the HA protection state indicates that the VM should be restarted on an active host system in the case of a failure in the host cluster.
Abstract:
Techniques are described for storing a virtual disk in an object store comprising a plurality of physical storage devices housed in a plurality of host computers. A profile is received for creation of the virtual disk wherein the profile specifies storage properties desired for an intended use of the virtual disk. A virtual disk blueprint is generated based on the profile such that that the virtual disk blueprint describes a storage organization for the virtual disk that addresses redundancy or performance requirements corresponding to the profile. A set of the physical storage devices that can store components of the virtual disk in a manner that satisfies the storage organization is then determined.
Abstract:
Embodiments presented herein techniques for balancing a multidimensional set of resources of different types within a distributed resources system. Each host computer providing the resources publishes a status on current resource usage by guest clients. Upon identifying a local imbalance, the host computer determines a source workload to migrate to or from the resources container to minimize the variance in resource usage. Additionally, when placing a new resource workload, the host computer selects a resources container that minimizes the variance to further balance resource usage.