METHODS FOR OPTIMIZED VARIABLE-SIZE DEDUPLICATION USING TWO STAGE CONTENT-DEFINED CHUNKING AND DEVICES THEREOF

    公开(公告)号:US20200081868A1

    公开(公告)日:2020-03-12

    申请号:US16247014

    申请日:2019-01-14

    Applicant: NetApp, Inc.

    Inventor: Xing Lin Fan Ni

    Abstract: Methods, non-transitory machine readable media, and computing devices that compare a hash value to a predefined value for sliding windows in parallel for segments partitioned from an input data stream. A bit array is parsed according to minimum and maximum chunk sizes to identify chunk boundaries for the input data stream. The bit array is populated based on a result of the comparison and portions of the bit array are parsed in parallel. Unique chunks of the input data stream defined by the chunk boundaries are stored in a storage device. Accordingly, this technology utilizes parallel processing in two stages. In a first stage, rolling window based hashing is performed concurrently to identify potential chunk boundaries. In a second stage, actual chunk boundaries are selected based on minimum and maximum chunk size constraints. This technology advantageously facilitates significant deduplication ratio improvement as well as improved parallel chunking performance.

Patent Agency Ranking