Abstract:
Data storage system and method for managing transaction requests in the data storage system utilizes prepare requests for a transaction request for multiple data storage operations. The prepare requests are sent to selected destination storage nodes of the data storage system to handle the multiple data storage operations. Each prepare request includes at least one of the multiple data storage operations to be handled by a particular destination data store node and a list of the destination storage nodes involved in the transaction request.
Abstract:
A multi-tenant storage system can store clear text data and associated clear text checksum received from a storage tenant using their associated cryptographic key (“cryptokey”). When the clear text data is compressible, cryptographic data (“cryptodata”) is generated from a concatenation of the clear text checksum and compressed clear text data using the cryptokey. A cryptographic checksum (“cryptochecksum”) is generated from the cryptodata. When the clear text data is uncompressible, cryptographic data (“cryptodata”) is generated by encrypting the clear text data using the cryptokey with an extra verification step to make sure the clear text checksum can be rebuilt during the read request. A cryptographic checksum (“cryptochecksum”) is generated from the cryptodata. The cryptodata and associated cryptochecksum are stored in the multi-tenant storage system, so that repairs to damaged cryptodata can be made using the associated cryptochecksum.
Abstract:
A Bε-tree associated with a file system on a storage volume includes a hierarchy of nodes. Each node includes a buffer portion that can be characterized by a fixed maximum allowable size to store key-value pairs as messages in the buffer. Messages can be initially buffered in the root node of the Bε-tree, and flushed to descendent children from the root node. Messages stored in the buffers can be indexed using a B+-tree data structure. As the B+-tree data structure in a buffer grows (due to receiving flushed messages) and shrinks (due to messages being flushed), disk blocks can be allocated from the storage volume to increase the actual size of the buffer and deallocated from the buffer to reduce the actual size of the buffer.
Abstract:
System and method for managing storage metadata utilize a metadata data structure containing allocation information of storage blocks of a storage system in which a portion of the metadata data structure that corresponds to a group of the storage blocks can be reserved to a requesting client, which then manages the portion of the metadata data structure using a copy of the portion of the metadata data structure.
Abstract:
A file system stores directories and files in a file system directory that uses case sensitive names. The same file system directory can support directory and file name lookups that treat the directory and file names in a case sensitive manner or in a case insensitive manner. The search criteria used for the lookup can be based on case-folding the name to produce a case-neutral name and on the original name with its case preserved. Search criteria can be generated for a case sensitive name lookup or for a case insensitive name lookup on the same file system directory, thus avoiding having to support separate file systems or separate file system directories for case sensitive and case insensitive file access.
Abstract:
Exemplary methods, apparatuses, and systems maintain hole boundary information by calculating a block attribute parity value. For example, a request is received to write to a first block of a stripe of data. A block attribute of a second block is determined. The block attribute of the second block indicates whether the second block includes written data or is a hole. A block attribute parity value is calculated based upon both the block attribute of the first block and the block attribute of the second block. The block attribute of the first block indicates the first block includes written data based upon the received request. The block attribute parity value and the data parity value are stored on one of the physical storage devices in response to the received write request. As a result, if a disk is lost, holes can be recovered using the block attribute parity value.
Abstract:
Techniques for performing fine-grained metadata management in a distributed file system (DFS) are provided. In one embodiment, each node in a plurality of nodes implementing the DFS can execute a namespace metadata service that is dedicated to managing file system metadata pertaining to one or more namespaces of the DFS. Each node can further execute a data metadata service that is distinct from the namespace metadata service, where the data metadata service is dedicated to managing file system metadata pertaining to properties of data and free space in the DFS.
Abstract:
Exemplary methods, apparatuses, and systems generate an encryption key based upon data content of a portion of data to be encrypted by the encryption key. The encryption key is stored as one of a plurality of encryption keys within a subset of storage. Each of the plurality of encryption keys is generated based upon corresponding data content. A checksum representing the plurality of encryption keys is calculated. In response to receiving an input/output (I/O) request for data encrypted by the encryption key, a verification checksum representing the plurality of encryption keys is calculated. The requested data is decrypted using the encryption key in response to verifying the checksum and verification checksum match.
Abstract:
A deduplication storage system with snapshot and clone capability includes storing logical pointer objects and organizing a first set of the logical pointer objects into a hierarchical structure. A second set of the logical pointer objects may be associated with corresponding logical data blocks of a client data object. The second set of the logical pointer objects may point to physical data blocks having deduplicated data that comprise data of the corresponding logical data blocks. Some of the logical pointer objects in the first set may point to the logical pointer objects in the second set, so that the hierarchical structure represents the client data object. A root of the hierarchical structure may be associated with the client data object. A snapshot or clone may be created by making a copy of the root and associating the copied root with the snapshot or clone.
Abstract:
Techniques for reducing write latency when logging write operations are provided. In one embodiment, a computer system can receive a write operation originating from a storage client, where the write operation is directed to a data object stored on a nonvolatile storage of the computer system. The computer system can further calculate a checksum value based on the contents of the data object as modified by the write operation, and generate a log record for the write operation that includes the first checksum value and a pointer to a location of the data object on the nonvolatile storage. The computer system can then issue the write operation and a write of the log record concurrently to the nonvolatile storage, thereby reducing the latency incurred for the overall write/logging process before a write acknowledgement is sent to the storage client.