Abstract:
A method and system is provided for copying data between two deduplicating storage systems. A list of unique fingerprints is compiled from the data which will be sent. This list is transmitted to the receiving system during a preliminary data exchange called the preamble. The receiving system replies with a second list which contains the unique fingerprints of the data which either needs to be sent or can be omitted. Which list depends on the size of the list where the smaller list is sent for efficiency and lower bandwidth consumption. A reference list of duplicate blocks being sent is retained on the receiving system until the copy operation is complete. This reference list is used to protect blocks on the receiving system by deferring deletions until the incoming hallow block can reference the duplicate block on the receiver to confirm that is on the target system and should not be deleted.
Abstract:
A method and system of optimizing the memory usage and performance of data deduplication storage systems includes organizing the metadata of data blocks needed by deduplicating storage systems. A three level hierarchy is used. Level 1 stores the metadata on disk along with the user data. Level 2 uses low latency storage (e.g. RAM and Solid State Disks) to cache the on-disk meta data for faster direct access. Level 3 organizes the fingerprints using a Trie and is entirely resident in RAM. Thus, the search, to determine whether a data block is unique or not and a candidate for transfer, can be more efficiency executed and to ensure that the meta data is transactionally secure.