Abstract:
Solutions for supporting storage using a multi-writer log-structured file system (LFS) are disclosed that include receiving incoming data from an object of a plurality of objects that are configured to simultaneously write to the LFS from different nodes; based at least on receiving the incoming data, determining whether sufficient free segments are available in a local segment usage table (SUT) for writing the incoming data; based at least on determining that insufficient free segments are available , requesting allocation of new free segments; writing the incoming data to a log; acknowledging the writing to the object; determining whether the log has accumulated a full segment of data; based at least on determining that the log has accumulated a full segment of data, writing the full segment of data to a first segment of the free segments; and updating the local SUT to mark the first segment as no longer free.
Abstract:
The disclosure provides techniques for deduplicating files. The techniques include, upon creating or modifying a file, placing a logical timestamp of the current logical time, within a queue associated with the directory of the file. The techniques further include placing the logical timestamp within a queue of each parent directory of the directory of the file. To determine a set of files for deduplication, the techniques disclosed herein identify files that have been modified within a logical time range. The set of files modified within a logical time is identified by traversing directories of a storage system, the directories being organized within a tree structure. If a directory's queue does not contain a timestamp that is within the logical time range, then all child directories can be skipped over for further processing, such that no files within the child directories end up being within the set of files for deduplication.
Abstract:
Embodiments described herein are related to cloning a volume in a file system. In some embodiments, a directory hard link is used to generate a clone of the root node of the volume. In certain embodiments, upon determining that a file or directory of the clone which comprises a hard link to an index node has been modified, a new object directory is generated beneath a root node of the volume. The index node may be added to the new object directory and one or more files and directories in the volume which link to the index node may be updated to contain symbolic links to the index node in the new object directory. In certain embodiments, a copy-on-write operation is performed in order to copy the file or directory and the new object directory to the clone.
Abstract:
A transaction manager for handling operations on data in a storage system provides a system for executing transactions that uses a versioned tuple cache to achieve fast, abortable transactions using a redo-only log. The transaction manager updates an in-memory key-value store and also attaches a transaction identifier to the tuple as a minor key. Opportunistic locking can be accomplished due to the low cost of aborting transactions.
Abstract:
Exemplary methods, apparatuses, and systems maintain hole boundary information by calculating a block attribute parity value. For example, a request is received to write to a first block of a stripe of data. A block attribute of a second block is determined. The block attribute of the second block indicates whether the second block includes written data or is a hole. A block attribute parity value is calculated based upon both the block attribute of the first block and the block attribute of the second block. The block attribute of the first block indicates the first block includes written data based upon the received request. The block attribute parity value and the data parity value are stored on one of the physical storage devices in response to the received write request. As a result, if a disk is lost, holes can be recovered using the block attribute parity value.
Abstract:
Systems and methods for allocating space in persistent storage are provided. A modified bitmap and a tree of bitmap summary pages are used to manage the free space of a large scale storage system. The bitmap is separated into fixed size pages and has bitmap summary entries to summarize the information in the bitmap. Bitmap summary pages can be further summarized into secondary summary pages. The tree data structure can continue to N levels until a topmost level has one bitmap summary page.
Abstract:
An example method of managing a log-structured file system (LFS) on a storage device includes: receiving, at storage software executing on a host, an operation that overwrites a data block, the data block included in a segment of the LFS; determining from first metadata stored on the storage device, a change in utilization of the segment from a first utilization value to a second utilization value; modifying second metadata stored on the storage device to change a relation between the segment and a first bucket to be a relation between the segment and a second bucket, the first utilization value included in a range of the first bucket and the second utilization value included in a range of the second bucket; and executing a garbage collection process for the LFS that uses the second metadata to identify for garbage collection a set of segments in the second bucket.
Abstract:
The disclosure herein describes deduplicating data chunks using chunk objects. A batch of data chunks is obtained from an original data object and a hash value is calculated for each data chunk. A first duplicate data chunk is identified using the hash value and a hash map. A chunk logical block address (LBA) of a chunk object is assigned to the duplicate data chunk. Payload data of the duplicate data chunk is migrated from the original data object to the chunk object, and a chunk map is updated to map the chunk LBA to a physical sector address (PSA) of the migrated payload data on the chunk object. A hash entry is updated to map to the chunk object and the chunk LBA. An address map of the original data object is updated to map an LBA of the duplicate data chunk to the chunk object and the chunk LBA.
Abstract:
Deleting directories in a virtual distributed file system (VDFS), and non-virtual file systems, involves changing the name of a selected directory to a unique object identifier (UID) and moving the selected directory, named according to the UID, to a deletion target directory. A recursive process, implemented using a background deletion thread, starts in the current directory and identifies objects in the current directory. For an object that is a file or an empty directory, the object is added to a deletion queue. For an object that is a directory that is not empty, the recursion drops down into that directory as the new current directory. When the recursion has exhausted the selected directory, or some maximum object count has been reached, the objects identified in the deletion queue are deleted. This approach can also be used for file operations other than deletion, such as compression, encryption, and hashing.
Abstract:
The disclosure describes growing a data cache using a background hash bucket growth process. A first memory portion is allocated to the data buffer of the data cache and a second memory portion is allocated to the metadata buffer of the data cache based on the cache growth instruction. The quantity of hash buckets in the hash bucket buffer is increased and the background hash bucket growth process is initiated, wherein the process is configured to rehash hash bucket entries of the hash bucket buffer in the increased quantity of hash buckets. A data entry is stored in the data buffer using the allocated first memory portion of the data cache and metadata associated with the data entry is stored using the allocated second memory portion of the metadata buffer, wherein a hash bucket entry associated with the data entry is stored in the increased quantity of hash buckets.