Abstract:
The disclosed embodiments relate to the design of an append-only data storage system that stores sets of data blocks in extents that are located in storage devices in the system. During operation of the system, when an extent becomes full, the system changing the extent from an open state, wherein data can be appended to the extent, to a closed state, wherein data cannot be appended to the extent. Changing the extent from the open state to the closed state includes performing the following operations at one or more storage devices that contain copies of the extent: constructing an index to facilitate accessing data blocks in a copy of the extent contained in the storage device; and appending the index to the copy of the extent in non-volatile storage in the storage device.
Abstract:
The disclosed embodiments relate to the design of an append-only data storage system that stores sets of data blocks in extents that are located in storage devices in the system. During operation of the system, when an extent is in an open state, the system allows data blocks to be appended to the extent, and disallows operations to be performed on the extent that are incompatible with data being concurrently appended to the extent. When the extent becomes full, the system changes the extent from the open state to a closed state. Then, while the extent is in the closed state, the system disallows data blocks to be appended to the extent, and allows operations to be performed on the extent that are incompatible with data being concurrently appended to the extent.
Abstract:
A distributed computing system that executes a set of long-lived jobs is described. During operation, each worker process performs the following operations. First, the worker process identifies a set of jobs to be executed and a set of worker processes that can execute the set of jobs. Next, the worker process sorts the set of worker processes based on unique identifiers for the worker processes. Then, the worker process assigns jobs to each worker process in the set of worker processes, wherein approximately the same number of jobs is assigned to each worker process, and jobs are assigned to the worker processes in sorted order. While assigning jobs, the worker process uses an identifier for each worker process to seed a pseudorandom number generator, and then uses the pseudorandom number generator to select jobs for each worker process to execute.
Abstract:
An append-only data storage system is described that stores sets of data blocks in extents that are located in storage devices. During operation of the system, upon receiving a request to copy an extent from a source storage device to a destination storage device, the system creates a scratch extent on the destination storage device, and associates the scratch extent with a private identifier, whereby the scratch extent can only be accessed through the private identifier. The system uses the private identifier to perform a copying operation that copies the extent from the source storage device to the scratch extent on the destination storage device. After the copying operation is complete and the scratch extent is closed, the system associates the scratch extent with a public identifier, whereby the copy of the extent on the destination storage device becomes publically accessible to other entities in the data storage system.