Abstract:
Techniques for a data storage cluster and a method for maintaining and updating reliability data and reducing data communication between nodes, are disclosed herein. Each data object is written to a single data zone on a data node within the data storage cluster. Each data object includes one or more data chunks, and the data chunks of a data object are written to a data node in an append-only log format. When parity is determined for a reliability group including the data zone, there is no need to transmit data from other data nodes where the rest of data zones of the reliability group reside. Thus, inter-node data communication for determining reliability data is reduced.
Abstract:
An apparatus to compare two datasets, each of which includes multiple data blocks, includes a comparison unit and a report generator. The comparison unit identifies block-level differences therebetween, by comparing block-level metadata between the first and second datasets, without comparing the contents of the data blocks. The report generator generates a human-readable report of the differences between the first and second version of the dataset, including the differences in individual data blocks between the first and second version of the dataset.
Abstract:
A system and method reclaims unused storage space from a data container, such as a logical unit number (LUN) of a storage system. In particular, a novel technique is provided that allows a storage system to reclaim storage space not used by a client file system for which the storage system maintains storage, without requiring assistance from the client file system to determine storage usage. In other words, storage system may independently reclaim storage space not used by the client file system, without that file system's intervention.
Abstract:
A facility for comparing two datasets and identifying metadata differences between the two datasets irrespective of the manner in which the data is stored. In some embodiments, the facility includes a comparison unit and a catalog unit. The comparison unit compares a hierarchical hash of a first dataset with a hierarchical hash of a second dataset, the hierarchical hashes each including a plurality of hierarchical hash values, to identify differences in metadata of the first and second datasets by progressively comparing the hierarchical hash values of the first and second hierarchical hashes without comparing the metadata of the first and second datasets. The catalog unit generates a catalog of differences between the first and second datasets, the catalog indicating differences in metadata of the first and second datasets.