Abstract:
An in-memory cache for a computer system having a first storage and a second storage where the first storage is a cache for the second storage, tracks priority levels of block attributes stored therein. If a data item is cached in the first storage, the block attribute corresponding to the data item is stored in the in-memory cache as a high priority block attribute. If a data item evicted from the first storage, the block attribute corresponding to the data item is stored in the in-memory cache as a low priority block attribute. When the cache becomes full, the low priority block attributes are evicted before the high priority block attributes.
Abstract:
A cache is sized using an ordered data structure having data elements that represent different target locations of input-output operations (IOs), and are sorted according to an access recency parameter. The cache sizing method includes continually updating the ordered data structure to arrange the data elements in the order of the access recency parameter as new IOs are issued, and setting a size of the cache based on the access recency parameters of the data elements in the ordered data structure. The ordered data structure includes a plurality of ranked ring buffers, each having a pointer that indicates a start position of the ring buffer. The updating of the ordered data structure in response to a new IO includes updating one position in at least one ring buffer and at least one pointer.
Abstract:
A file system uses a B-tree data structure to organize file data. The file system may maintain an index node (mode) representing a file and having entries that map to extents of the file. When the file system detects an index node, through updates, has exceeded a threshold number of extents, the file system converts the file to a copy-on-write (COW) B-tree data structure containing the entries representing the extents of the file. To clone the file, the file system uses copies of the index node and the root node of the COW B-tree data structure.
Abstract:
A sorted key-value store is implemented using a write-back cache maintained in memory, a B-tree data structured maintained in disk, and a logical and physical log for providing transactions. The logical log and write-back cache are used to answer client requests, while dirty blocks in the write-back cache are periodically flushed to disk using the physical log.
Abstract:
Exemplary methods, apparatuses, and systems include a first layer of a virtual storage area network (VSAN) module receiving a write request from a data compute node. The write request includes data to be written and the VSAN module is distributed across a plurality of computers to provide an aggregate object store using storage attached to each of the plurality of computers. The first layer of the VSAN module calculates a checksum for the data to be written and passes the data to be written and the checksum to a second layer of the VSAN module. The second layer of the VSAN module calculates a first verification checksum for the data to be written. The data and the checksum are written to persistent storage in response to determining the first verification checksum matches the checksum passed by the first layer of the VSAN module.
Abstract:
Examples perform asynchronous deduplication of storage, such as virtualized or physical disks. Incoming input/output (I/O) commands containing data are subdivided into blocks which are written both to storage and to an in-memory cache. As idle processing resources become available, deduplication is performed on the storage using the in-memory cache. In this manner, read operations from storage are avoided in favor of the read operations from the in-memory cache.
Abstract:
Chunks of data are identified and deduplication is performed on the chunks of data using associated cyclic redundancy check (CRC) values. A plurality of CRC values is obtained that is associated with consecutive data blocks stored in a payload data store. Cut point CRC values are identified in the plurality of CRC values and CRC chunks are identified based on those cut point CRC values, wherein each CRC chunk is bounded by two consecutive cut point CRC values. A CRC chunk hash value is generated for each CRC chunk. A pair of duplicate CRC chunks is identified using the CRC chunk hash values and a deduplication operation is performed in association with the identified pair of duplicate CRC chunks. Using existing CRC values during the identification of chunk cut points reduces the computing resource costs associated with performing that process using the data blocks.
Abstract:
Solutions for secure metering of hyperconverged infrastructures are disclosed. Examples include: receiving a security token; accessing a secondary storage (e.g., cold storage, backups) using the security token; determining usage data for the secondary storage; generating a first message digest for a combination of the usage data and the security token; and transmitting, to a metering server, the usage data and the first message digest. In some examples, the combination of the usage data and the security token comprises a concatenation of the usage data and the security token. In some examples, the metering server requests verification usage data from the secondary storage, generates a second message digest for a combination of the verification usage data and the security token, and compares the first message digest with the second message digest. Examples do not persist the security token on customer premises. Examples leverage the usage data to optimize the secondary storage.
Abstract:
A method for deleting one or more snapshots using micro-batch processing is provided. The method includes receiving a request to delete the one or more snapshots, identifying one or more middle map extents exclusively owned by the one or more snapshots requested to be deleted, wherein metadata for the one or more snapshots is stored in one or more logical maps having logical map extents mapping logical block addresses (LBAs) to middle block addresses (MBAs) and a middle map having middle map extents mapping MBAs to physical block addresses (PBAs) of physical locations where data blocks are written, adding MBAs of the identified one or more middle map extents in a batch, determining a first micro-batch including a first subset of the MBAs in the batch, the first subset of MBAs being MBAs less than a first upper bound MBA, and using a first transaction to delete the middle map extents corresponding to the first subset of MBAs included in the first micro-batch.
Abstract:
A method for resumeable snapshot deletion is provided. A method for deletion of nodes maintained in an ordered data structure for a first snapshot includes processing the nodes maintained in the ordered data structure according to a defined order, setting a node path cursor with a pointer to a node and an indication of the deletion of the node; storing the node path cursor in a persistent storage; and during processing of the nodes: detecting a failure; after the failure, checking the pointer of the node path cursor; and resuming processing of the nodes starting from the first node indicated by the pointer.