Abstract:
A system that implements detection and reconciliation of system resource metadata for a distributed storage system is described. A node may obtain resource metadata specific to the node from another node that maintains system resource metadata for a distributed storage system. Based on the resource metadata specific to the node, a determination may be made that the node is not reconciled with the system resource metadata. A corrective operation may be performed to reconcile the node with the system resource metadata. A corrective operation may include terminating a resource, making unavailable a resource, modifying resource attributes, or sending a resource metadata update to system resource metadata for correction.
Abstract:
A distributed database system may implement dynamic quorum group membership changes. In various embodiments, a quorum set may maintain a replica of a data object among group members according to a protection group policy for the data object. A group member may be identified as to be replaced. In response, a new quorum set may be created from the remaining group members and a new group member. The protection group policy may be updated to include the new group members such that subsequently received updates are maintained at both the previous quorum set and the new quorum set. Previously received updates may be replicated on the new group member. Upon completion of replicating the previously received updates, the protection group policy for the data object may be revised such that subsequently received updates are maintained at the new quorum set.
Abstract:
Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.