Abstract:
A method performed in a network storage system, the method including receiving a plurality of data blocks at a secondary storage subsystem from a primary storage subsystem, generating a first log that includes a first plurality of entries, one entry for each of the data blocks, in which each entry of the first plurality of entries includes a name for a respective data block and a fingerprint of the respective data block, receiving metadata at the secondary storage subsystem from the primary storage subsystem, the metadata describing relationships between the plurality of blocks and a plurality of files, generating a second log that includes a second plurality of entries, and merging the first log with the second log to generate a change log.
Abstract:
A method performed in a network storage system, the method including receiving a plurality of data blocks at a secondary storage subsystem from a primary storage subsystem, generating a first log that includes a first plurality of entries, one entry for each of the data blocks, in which each entry of the first plurality of entries includes a name for a respective data block and a fingerprint of the respective data block, receiving metadata at the secondary storage subsystem from the primary storage subsystem, the metadata describing relationships between the plurality of blocks and a plurality of files, generating a second log that includes a second plurality of entries, and merging the first log with the second log to generate a change log.
Abstract:
A system can efficiently removes ranges of entries from a flat sorted data structure that represent stale fingerprints As part of fingerprint verification during deduplication, the system performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. During the AIRC procedure, an inode associated with a data container is selected and the FBN tuple of each deleted data block in the file is sorted in a predefined FBN order. The AIRC procedure then identifies the most recent fingerprint associated with a deleted data block. The set of non-overlapping and latest CP ranges is then used to remove stale fingerprints associated with that deleted block from the fingerprint database. A single pass through the fingerprint database identifies the set of non-overlapping and latest CP ranges, thereby improving efficiency of the storage system.
Abstract:
A system and method efficiently removes ranges of entries from a flat sorted data structure, such as a fingerprint database, of a storage system. The ranges of entries represent fingerprints that have become stale, i.e., are not representative of current states of corresponding blocks in the file system, due to various file system operations such as, e.g., deletion of a data block without overwriting its contents. A deduplication module of a file system executing on the storage system performs a fingerprint verification procedure to remove the stale fingerprints from the fingerprint database. As part of the fingerprint verification procedure, the deduplication module performs an attributes intersect range calculation (AIRC) procedure on the stale fingerprint data structure to compute a set of non-overlapping and latest consistency point (CP) ranges. During the AIRC procedure, an inode associated with a data container, e.g., a file, is selected and the FBN tuple of each deleted data block in the file is sorted in a predefined, e.g., increasing, FBN order. The AIRC procedure then identifies the most recent fingerprint associated with a deleted data block. The output from the AIRC procedure, i.e., the set of non-overlapping and latest CP ranges, is then used to remove stale fingerprints associated with that deleted block (as well as each other deleted data block) from the fingerprint database. Notably, only a single pass through the fingerprint database is required to identify the set of non-overlapping and latest CP ranges, thereby improving efficiency of the storage system.