Abstract:
Partially overwriting a compression group without decompressing compressed data can consumption of resources for the decompression. A storage server partially overwrites the compression group when a file block identifier of a client's write request resolves to the compression group. The compression group remains compressed while the partial overwriting is performed.
Abstract:
A technique for organizing data to facilitate data deduplication includes dividing a block-based set of data into multiple “chunks”, where the chunk boundaries are independent of the block boundaries (due to the hashing algorithm). Metadata of the data set, such as block pointers for locating the data, are stored in a tree structure that includes multiple levels, each of which includes at least one node. The lowest level of the tree includes multiple nodes that each contain chunk metadata relating to the chunks of the data set. In each node of the lowest level of the buffer tree, the chunk metadata contained therein identifies at least one of the chunks. The chunks (user-level data) are stored in one or more system files that are separate from the buffer tree and not visible to the user.
Abstract:
Methods and systems for co-locating journaling and data storage are provided. Separate journal and volume partitions may be maintained within each logical storage unit (e.g., Logical Unit Number (LUN)) of a distributed storage system. Journaling of metadata associated with write requests received from one or more clients may be distributed by identifying a destination logical storage unit to which data associated with a given write request is to be stored and causing the data and metadata to be persisted to disk by journaling the metadata and the data to respective portions of an active log within the journal partition of the destination logical storage unit. By using the same logical storage unit for both journaling of write requests and writing the data associated with such write requests, the bottleneck due to there being only a single device or storage unit handling all metadata for all write requests can be avoided.
Abstract:
A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.
Abstract:
A system and method for logically organizing compressed data. In one aspect, a destination storage server receives a write request that includes multiple data blocks and specifies corresponding file block numbers. An extent-based file system executing on the storage server accesses intermediate block entries that each associates one of the file block numbers with a respective extent block number. The file system, in cooperation with a compression engine, compresses the data blocks into a set of one or more compressed data blocks. The file system stores the compressed data blocks at physical locations corresponding to physical block numbers and allocates, within an extent map, pointers from an extent ID to the extent block numbers, and pointers from the extent ID to the physical block numbers.
Abstract:
It is determined that a first data block contains the same data as a second data block. The first data block is associated with a first extent and the second data block is associated with a second extent. In response to determining that the first data block contains the same data as the second data block, the second data block is associated with the first extent and the first data block is disassociated with the second extent.
Abstract:
A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.
Abstract:
Methods and systems for co-locating journaling and data storage are provided. Separate journal and volume partitions may be maintained within each logical storage unit (e.g., Logical Unit Number (LUN)) of a distributed storage system. Journaling of metadata associated with write requests received from one or more clients may be distributed by identifying a destination logical storage unit to which data associated with a given write request is to be stored and causing the data and metadata to be persisted to disk by journaling the metadata and the data to respective portions of an active log within the journal partition of the destination logical storage unit. By using the same logical storage unit for both journaling of write requests and writing the data associated with such write requests, the bottleneck due to there being only a single device or storage unit handling all metadata for all write requests can be avoided.
Abstract:
A method and system for co-locating journaling and data storage based on write requests. A write request that includes metadata and data is received from a client. A logical storage unit for storing the metadata and the data is identified. The logical storage unit is divided into a journal partition and a volume partition. The journal partition includes a first log and a second log. Which of the first log and the second log is an active log and which of the first log and the second log is an inactive log are identified. The metadata is recorded in a first location in the active log and the data is recorded in a second location in the active log during a single I/O operation. A reply is sent to the client after the metadata and the data are recorded in the journal partition.
Abstract:
A file system layout apportions an underlying physical volume into one or more virtual volumes of a storage system. The virtual volumes having a file system and one or more files organized as buffer trees, the buffer trees utilizing indirect blocks to point to the data blocks. The indirect block at the level above the data blocks are grouped into compression groups that point to a set of physical volume block number (pvbn) block pointers.