Deduplicating data for a data storage system using similarity determinations

    公开(公告)号:US09933970B2

    公开(公告)日:2018-04-03

    申请号:US14928848

    申请日:2015-10-30

    Applicant: NetApp, Inc.

    CPC classification number: G06F3/0641 G06F3/0608 G06F3/0686

    Abstract: A method and system for deduplicating data for a data storage system using similarity determinations are described. A tape library is arranged in a hierarchy of tape groups and tape plexes. Tape groups are an admin visible entity and are comprised of multiple tape plexes (at least equal to the number of replicas in a tape group). Tape plexes in turn comprise multiple tape cartridges. Data files and objects received within a time period are initially staged in a disk cache where they are logically segregated into cliques based on their expected deduplication ratios. These cliques are then evaluated for the amount of duplication they have with data existing in tape plexes. Based on the number of replicas being written, the top few tape plexes are selected from within the tape group. The cliques are deduplicated with data on the selected tape plexes, compressed, and written to tape.

    DEDUPLICATING DATA FOR A DATA STORAGE SYSTEM USING SIMILARITY DETERMINATIONS

    公开(公告)号:US20170123711A1

    公开(公告)日:2017-05-04

    申请号:US14928848

    申请日:2015-10-30

    Applicant: NetApp, Inc.

    CPC classification number: G06F3/0641 G06F3/0608 G06F3/0686

    Abstract: A method and system for deduplicating data for a data storage system using similarity determinations are described. A tape library is arranged in a hierarchy of tape groups and tape plexes. Tape groups are an admin visible entity and are comprised of multiple tape plexes (at least equal to the number of replicas in a tape group). Tape plexes in turn comprise multiple tape cartridges. Data files and objects received within a time period are initially staged in a disk cache where they are logically segregated into cliques based on their expected deduplication ratios. These cliques are then evaluated for the amount of duplication they have with data existing in tape plexes. Based on the number of replicas being written, the top few tape plexes are selected from within the tape group. The cliques are deduplicated with data on the selected tape plexes, compressed, and written to tape.

Patent Agency Ranking