Abstract:
A system can serialize moves and mounts across namespaces based on lamport clocks. In some examples, the system obtains a request to move a content item from a source namespace to a destination namespace. The system processes an incoming move at the destination and an outgoing move at the source. The system processes for the content item a delete at the source and an add at the destination. The system assigns a first clock to the incoming move and a second clock to the outgoing move, the first clock being lower than the second clock. The system assigns a third clock to the delete and a fourth clock to the add, the third clock being higher than the second clock and lower than the fourth clock. The system serializes the incoming and outgoing moves, the delete and the add based on the first, second, third and fourth clocks.
Abstract:
Reconstructing in-memory data block indices in a distributed data storage system where data blocks are stored in extents and the extents are replicated across storage devices. In one aspect, based on a reboot of a storage device and a copy of an extent stored in the storage device being in an open state, appends for data blocks in the copy of the extent stored in the storage device are replayed to reconstruct an in-memory data block index for the copy of the extent. In another aspect, based on a reboot of a storage device and a copy of an extent being in a closed state, a data block index for the copy of the extent is retrieved from non-volatile storage of the storage device and the retrieved data block index stored in memory at the storage device.
Abstract:
A system can serialize moves and mounts across namespaces based on lamport clocks. In some examples, the system obtains a request to move a content item from a source namespace to a destination namespace. The system processes an incoming move at the destination and an outgoing move at the source. The system processes for the content item a delete at the source and an add at the destination. The system assigns a first clock to the incoming move and a second clock to the outgoing move, the first clock being lower than the second clock. The system assigns a third clock to the delete and a fourth clock to the add, the third clock being higher than the second clock and lower than the fourth clock. The system serializes the incoming and outgoing moves, the delete and the add based on the first, second, third and fourth clocks.
Abstract:
Reconstructing in-memory data block indices in a distributed data storage system where data blocks are stored in extents and the extents are replicated across storage devices. In one aspect, based on a reboot of a storage device and a copy of an extent stored in the storage device being in an open state, appends for data blocks in the copy of the extent stored in the storage device are replayed to reconstruct an in-memory data block index for the copy of the extent. In another aspect, based on a reboot of a storage device and a copy of an extent being in a closed state, a data block index for the copy of the extent is retrieved from non-volatile storage of the storage device and the retrieved data block index stored in memory at the storage device.
Abstract:
A append-only data storage system that stores sets of data blocks in extents that are located in storage devices. When an extent becomes full, the system changes the extent from an open state, wherein data can be appended to the extent, to a closed state, wherein data cannot be appended to the extent. This change involves performing a synchronization operation by: obtaining a list of data blocks in the extent from each storage device that has a copy of the extent; forming a union of the lists; looking up data blocks from the union in a database that maps data blocks to storage devices and extents to determine which data blocks belong in the extent; and if a copy of the extent is missing data blocks that belong in the extent, performing a remedial action before changing the extent from the open state to the closed state.
Abstract:
The disclosed embodiments relate to a system that uses colocation hints to facilitate storing data blocks in a distributed data storage system, which includes a plurality of data centers. During operation, the system receives a write request from a client to write a data block to the distributed data storage system, wherein the write request includes a colocation hint that identifies a colocation group associated with the data block. In response to the write request, the system uses the colocation hint to identify one or more data centers associated with the colocation group. Next, the system writes copies of the data block to the one or more identified data centers. In this way, the system situates copies of data blocks associated with the colocation group in the same data centers when possible.
Abstract:
A system that stores sets of data blocks in extents located in storage devices is described. During operation, a receiving device receives, through an RPC framework, a first call asking to transfer an extent from a sending device to the receiving device. In response, the receiving device opens a port for a data connection that operates outside the RPC framework. The receiving device makes a second call, to the sending device through the RPC framework, asking to stream the extent to the port. The receiving device subsequently receives the extent from the sending device through the port and computes a checksum for the extent. The receiving device also receives a return from the second call, including a checksum for the extent computed by the sending device. If the computed checksum matches the received checksum, the receiving device returns the first call to indicate the transfer operation completed successfully.
Abstract:
The disclosed embodiments relate to the design of an append-only data storage system that stores sets of data blocks in extents that are located in storage devices in the system. During operation of the system, when an extent becomes full, the system changing the extent from an open state, wherein data can be appended to the extent, to a closed state, wherein data cannot be appended to the extent. Changing the extent from the open state to the closed state includes performing the following operations at one or more storage devices that contain copies of the extent: constructing an index to facilitate accessing data blocks in a copy of the extent contained in the storage device; and appending the index to the copy of the extent in non-volatile storage in the storage device.
Abstract:
The disclosed embodiments relate to the design of an append-only data storage system that stores sets of data blocks in extents that are located in storage devices in the system. During operation of the system, when an extent is in an open state, the system allows data blocks to be appended to the extent, and disallows operations to be performed on the extent that are incompatible with data being concurrently appended to the extent. When the extent becomes full, the system changes the extent from the open state to a closed state. Then, while the extent is in the closed state, the system disallows data blocks to be appended to the extent, and allows operations to be performed on the extent that are incompatible with data being concurrently appended to the extent.