Abstract:
Embodiments of the disclosure provide techniques for partitioning a resource object into multiple resource components of a cluster of host computer nodes in a distributed resources system. The distributed resources system translates high-level policy requirements into a resource configuration that the system accommodates. The system determines an allocation based on the policy requirements and identifies resource configurations that are available. Upon selecting a resource configuration, the distributed resources system assigns the allocation and associated values to the selected configuration and publishes the new configuration to other host computer nodes in the cluster.
Abstract:
Example methods and systems for accessing data in a log-structured file system having a plurality of snapshots of storage objects backed by a first-level copy-on-write (COW) B+ tree data structure and a plurality of second-level B+ tree data structures have been disclosed. One example method includes obtaining a first first-level mapping associated with a first snapshot from the plurality of snapshots based on a first logical block address, wherein each of the plurality of snapshots corresponds to each of the plurality of second-level B+ tree data structures, identifying a first second-level B+ tree data structure corresponding to one of the plurality of snapshots based on the first first-level mapping, obtaining a first second-level mapping based on the first logical block address in the first second-level B+ tree data structure, obtaining a first physical block address based on the first second-level mapping, and accessing data at the first physical block address.
Abstract:
The disclosure herein describes performing resynchronization (“resync”) jobs in a distributed storage system based on a parallelism policy. A resync job is obtained from a queue and input/output (I/O) resources that will be used during execution of the resync job are identified. Available bandwidth slots of each I/O resource of the identified I/O resources are determined. The parallelism policy is applied to the identified I/O resources and the available bandwidth slots. Based on the application of the parallelism policy, a bottleneck resource of the I/O resources is determined and a parallel I/O value is calculated based on the available bandwidth slots of the bottleneck resource, wherein the parallel I/O value indicates a quantity of I/O tasks that can be performed in parallel. The resync job is executed using the I/O resources, the execution of the resync job including performance of I/O tasks in parallel based on the parallel I/O value.
Abstract:
A method for modifying key-value pairs of a B+ tree is provided. The method receives a request to modify a particular key-value pair. Each node of the tree has a modification number. The method traverses a path on the tree from the root node toward the particular node. The traversing includes upon reaching a parent node of the path, acquiring a shared lock on both the parent node and a child node one level below the parent node. Upon determining that the child node is the particular node, the method stores the modification number of the particular node, releases the shared lock on the particular node, compares a current modification number of the node with its stored number, and acquires an exclusive lock on the node if the numbers are the same. The method increments the current modification number of the node and modifies it while in the exclusive lock.
Abstract:
Hybrid synchronization using a shadow component includes detecting a first component of a plurality of mirrored components of a distributed data object becoming unavailable. The mirrored components include a delta component (a special shadow component) and a regular mirror (shadow) component. The delta component indicates a shorter history of changes to data blocks of a log-structured file system (LFS) than is indicated by the regular mirror component. During the unavailability of the first component, at least one write I/O is committed by the delta component. The commit is tracked by the delta component in a first tracking bitmap associated with the delta component. Based at least on detecting the first component becoming available, the first component is synchronized with data from the delta component, based at least on changed data blocks indicated in the first tracking bitmap.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for resynchronizing data in a storage system. One of the methods includes receiving, by a first storage subsystem, a plurality of write requests corresponding to respective meta data blocks, wherein the first storage subsystem comprises a meta object; storing, by the first storage subsystem and for each write request, in each disk of the meta object, a version of the corresponding meta data block; determining that a particular disk of the meta object has failed; determining whether one or more valid versions of the meta data block are stored in respective other disks of the meta object; and in response to determining that one or more valid versions of the meta data block are stored in respective other disks of the meta object, resynchronizing the meta data block in the particular disk.
Abstract:
The disclosure herein describes performing resynchronization (“resync”) jobs in a distributed storage system based on a parallelism policy. A resync job is obtained from a queue and input/output (I/O) resources that will be used during execution of the resync job are identified. Available bandwidth slots of each I/O resource of the identified I/O resources are determined. The parallelism policy is applied to the identified I/O resources and the available bandwidth slots. Based on the application of the parallelism policy, a bottleneck resource of the I/O resources is determined and a parallel I/O value is calculated based on the available bandwidth slots of the bottleneck resource, wherein the parallel I/O value indicates a quantity of I/O tasks that can be performed in parallel. The resync job is executed using the I/O resources, the execution of the resync job including performance of I/O tasks in parallel based on the parallel I/O value.
Abstract:
In a storage cluster having nodes, blocks of a logical storage space of a storage object are allocated flexibly by a parent node to component nodes that are backed by physical storage. The method includes maintaining a first allocation map for the parent node, and second and third allocation maps for the first and second component nodes, respectively, executing a first write operation on the first component node and updating the second allocation map to indicate that the first block is a written block, selecting the second component node for executing a second write operation, and executing the second write operation on the second component node. Upon execution of the second write operation, the third allocation map is updated to indicate that the second block is a written block and the first allocation map is updated to indicate that the second block is allocated to the second component node.
Abstract:
Component objects of a virtual disk are backed by first storage nodes, which are at a primary site, and second storage nodes, which are at a secondary site. The method of resynchronizing the component objects of the virtual disk includes, at a coordinating node at the primary site, responsive to a second storage node coming back online, identifying an out-of-sync block of the second storage node, locating the out-of-sync block in an address space maintained for blocks of the virtual disk, and transmitting a resync command to a replication module of a coordinating node at the secondary site, the resync command identifying the out-of-sync block within the address space.
Abstract:
Embodiments of the disclosure provide techniques for partitioning a resource object into multiple resource components of a cluster of host computer nodes in a distributed resources system. The distributed resources system translates high-level policy requirements into a resource configuration that the system accommodates. The system determines an allocation based on the policy requirements and identifies resource configurations that are available. Upon selecting a resource configuration, the distributed resources system assigns the allocation and associated values to the selected configuration and publishes the new configuration to other host computer nodes in the cluster.