摘要:
A technique for organizing data to facilitate data deduplication includes dividing a block-based set of data into multiple “chunks”, where the chunk boundaries are independent of the block boundaries (due to the hashing algorithm). Metadata of the data set, such as block pointers for locating the data, are stored in a tree structure that includes multiple levels, each of which includes at least one node. The lowest level of the tree includes multiple nodes that each contain chunk metadata relating to the chunks of the data set. In each node of the lowest level of the buffer tree, the chunk metadata contained therein identifies at least one of the chunks. The chunks (user-level data) are stored in one or more system files that are separate from the buffer tree and not visible to the user.