Abstract:
Embodiments presented herein techniques for balancing a multidimensional set of resources of different types within a distributed resources system. Each host computer providing the resources publishes a status on current resource usage by guest clients. Upon identifying a local imbalance, the host computer determines a source workload to migrate to or from the resources container to minimize the variance in resource usage. Additionally, when placing a new resource workload, the host computer selects a resources container that minimizes the variance to further balance resource usage.
Abstract:
Embodiments of the disclosure provide techniques for measuring congestion and controlling quality of service to a shared resource. A module that interfaces with the shared resource monitors the usage of the shared resource by accessing clients. Upon detecting that the rate of usage of the shared resource has exceeded a maximum rate supported by the shared resource, the module determines and transmits a congestion metric to clients that are currently attempting to access the shared resource. Clients, in turn determine a delay period based on the congestion metric prior to attempting another access of the shared resource.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide logical data store, and where persistent data for the VMs is stored across the locally-attached storage resources comprising the logical data store.
Abstract:
Exemplary methods, apparatuses, and systems maintain hole boundary information by calculating a block attribute parity value. For example, a request is received to write to a first block of a stripe of data. A block attribute of a second block is determined. The block attribute of the second block indicates whether the second block includes written data or is a hole. A block attribute parity value is calculated based upon both the block attribute of the first block and the block attribute of the second block. The block attribute of the first block indicates the first block includes written data based upon the received request. The block attribute parity value and the data parity value are stored on one of the physical storage devices in response to the received write request. As a result, if a disk is lost, holes can be recovered using the block attribute parity value.
Abstract:
Techniques are disclosed for managing a cluster of computing nodes following a division of the cluster into at least a first and second partition, where the cluster aggregates local storage resources of the nodes to provide an object store, and objects stored in the object store are divided into data components stored across the nodes. In accordance with one method, it is determined that a majority of data components comprising a first object are stored within nodes in the first partition. It is determined that a majority of data components comprising a second object are stored within nodes in the second partition. Configuration objects are permitted to be performed on the first object in the first partition while denying access to the first object from the second partition, and on the second object in the second partition while denying access to the second object from the first partition.
Abstract:
Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second HA modules in the plurality of HA modules; (2) issues an accessibility query for the VM to the subset of second HA modules in parallel, the accessibility query being configured to determine whether the VM is accessible to the respective host systems of the subset of second HA modules; and (3) if at least one second HA module in the subset indicates that the VM is accessible to its respective host system, transmits a command to the at least one second HA module to invoke the API on its respective host system.
Abstract:
A deduplication storage system with snapshot and clone capability includes storing logical pointer objects and organizing a first set of the logical pointer objects into a hierarchical structure. A second set of the logical pointer objects may be associated with corresponding logical data blocks of a client data object. The second set of the logical pointer objects may point to physical data blocks having deduplicated data that comprise data of the corresponding logical data blocks. Some of the logical pointer objects in the first set may point to the logical pointer objects in the second set, so that the hierarchical structure represents the client data object. A root of the hierarchical structure may be associated with the client data object. A snapshot or clone may be created by making a copy of the root and associating the copied root with the snapshot or clone.
Abstract:
The present disclosure provides techniques for deduplicating files. The techniques include creating a cache or subset of a large data structure. The large data structure organizes information by random hash values. The random hash values result in a random organization of information within the data structure, with the information spanning a large number of storage blocks within a storage system. The cache, however, is within memory and is small relative to the data structure. The cache is created so as to contain information that is likely to be needed during deduplication of a file. Having needed information within memory rather than in storage results in faster read and write operations to that information, improving the performance of a computing system.
Abstract:
Embodiments of the disclosure provide techniques for partitioning a resource object into multiple resource components of a cluster of host computer nodes in a distributed resources system. The distributed resources system translates high-level policy requirements into a resource configuration that the system accommodates. The system determines an allocation based on the policy requirements and identifies resource configurations that are available. Upon selecting a resource configuration, the distributed resources system assigns the allocation and associated values to the selected configuration and publishes the new configuration to other host computer nodes in the cluster.
Abstract:
A method includes obtaining a plurality of representations corresponding respectively to a plurality of blocks of data stored on a source node. A plurality of data pairs are sent to a destination node, where each data pair includes a logical address associated with a block of data from the plurality of blocks of data and the corresponding representation of the block of data. A determination is made whether the blocks of data associated with the respective logical addresses are duplicates of data stored on the destination node. In accordance with an affirmative determination, a reference to a physical address of the block of data stored on the destination node is stored. In accordance with a negative determination, an indication that the data corresponding to the respective logical address is not a duplicate is stored. The data indicated as not being a duplicate is written to the destination node.