Selective data compression based on data similarity

    公开(公告)号:US11748307B2

    公开(公告)日:2023-09-05

    申请号:US17500246

    申请日:2021-10-13

    CPC classification number: G06F16/1744 G06F16/137 H03M7/3064

    Abstract: Technology is disclosed for selectively compressing data based on similarity of pages within the data that is to be compressed. At least one corresponding hash value is generated for each one of multiple candidate pages to be compressed. In response to the hash values generated for the candidate pages, the technology selects a set of similar candidate pages from the candidate pages. The set of similar candidate pages are a subset of the candidate pages that includes less than all the candidate pages. The set of similar candidate pages are compressed as a single unit, separately from one or more other ones of the candidate pages that were not selected to be included in the set of similar candidate pages.

    Techniques for estimating deduplication between storage volumes

    公开(公告)号:US11481131B2

    公开(公告)日:2022-10-25

    申请号:US17110672

    申请日:2020-12-03

    Abstract: Determining and using deduplication estimates may include: determining two deduplication sample indexes (DSIs) for two logical device sets each including one or more logical devices, determining a Jaccard Similarity for the two DSIs, wherein the Jaccard Similarity denotes a measurement of similarity and mutual deduplication between the two logical device sets; determining, in accordance with one or more criteria, whether the two logical device sets should be located in different data storage systems or a same data storage system that performs data deduplication, wherein the one or more criteria uses the Jaccard Similarity in determining whether to locate the two logical device sets in the same data storage system or the different data storage systems; and responsive to determining that the two logical device sets should be located in the same data storage system, locating the two logical device sets in the same data storage system.

    SYSTEM AND METHOD FOR EFFICIENT BACKGROUND DEDUPLICATION DURING HARDENING

    公开(公告)号:US20220035734A1

    公开(公告)日:2022-02-03

    申请号:US16940952

    申请日:2020-07-28

    Abstract: A method, computer program product, and computer system for identifying, by a computing device, content in a first bucket in a first cache. It may be determined that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket may be unique. The first portion of the content in the first bucket may be deduplicated from the first cache. The second portion of the content may be stored in a second bucket in a second cache.

    Cache management using a bucket-partitioned hash table

    公开(公告)号:US11210231B2

    公开(公告)日:2021-12-28

    申请号:US16665328

    申请日:2019-10-28

    Abstract: Techniques for performing cache management includes partitioning entries of a hash table into buckets, wherein each of the buckets includes a portion of the entries of the hash table, configuring a cache, wherein the configuring includes allocating a section of the cache for exclusive use by each bucket, and performing first processing that stores a data block in the cache. The first processing includes determining a hash value for a data block, selecting, in accordance with the hash value, a first bucket of the plurality of buckets, wherein a first section of the cache is used exclusively for storing cached data blocks of the first bucket, storing metadata used in connection with caching the data block in a first entry of the first bucket, and storing the data block in a first cache location of the first section of the cache.

    Efficient virtualization layer structure for a data storage system

    公开(公告)号:US11151056B2

    公开(公告)日:2021-10-19

    申请号:US16394632

    申请日:2019-04-25

    Abstract: Techniques for providing an efficient virtualization layer structure in a data storage system. The techniques include implementing multiple layers of indirection for accessing host data in the data storage system, including a mapping layer, a virtualization layer, and a physical layer. The virtualization layer includes virtual layer blocks (VLBs), each VLB including virtual pointers. Each virtual pointer is pointed to by one or more leaf pointers in the mapping layer, and points to a data block in the physical layer. The techniques include generating, for each virtual pointer in the VLB, reference count metadata to keep track of the number of leaf pointers pointing to the virtual pointer, and maintaining, in a metadata page, the reference count metadata for the virtual pointers in a three (3)-way mirror. The techniques include maintaining each VLB of the virtualization layer in a RAID stripe across multiple physical drives in the data storage system.

    Managing data block compression in a storage system

    公开(公告)号:US11099756B2

    公开(公告)日:2021-08-24

    申请号:US16654287

    申请日:2019-10-16

    Abstract: An aspect of managing data block compression in a storage system includes performing, for each block written to the storage system: bit-wise traversing the block, searching the block for a pattern indicating a repeating sequence of bits and, upon determining the pattern exists in the block and the repeating sequence of bits in the pattern exceeds a threshold value, removing the repeating sequence of bits from the block thereby yielding a reduced-size block.

    DEDUPLICATING FULL AND PARTIAL BLOCK MATCHES

    公开(公告)号:US20210133164A1

    公开(公告)日:2021-05-06

    申请号:US16668523

    申请日:2019-10-30

    Abstract: A technique for performing deduplication traverses a deduplication database and assigns digest values in the database to buckets, where each bucket covers a respective range of digest values. To deduplicate a particular candidate block, the technique generates a digest from the candidate block and searches for the computed digest in a subset of the buckets, where the subset is selected based on the computed digest. If a target block providing an exact match or a suitably close partial match is found in the subset of buckets, the technique effects storage of the candidate block at least in part by providing a reference to the target block.

    METADATA REPRESENTATION FOR ENABLING PARTIAL PAGE DUPLICATION

    公开(公告)号:US20210133117A1

    公开(公告)日:2021-05-06

    申请号:US16668342

    申请日:2019-10-30

    Abstract: An aspect includes providing a metadata structure having a logical level that points to a virtual level and a physical level to which the virtual level points. The method also includes storing, at the virtual level, a reference counter for each of a plurality of virtual-level type storage address entries in the metadata structure, and providing a pointer in the metadata structure between each pair of a number of pairs of virtual level address entries in which corresponding pages share a set of common sectors. The reference counter tracks a number of instances in which a corresponding pointer points to a corresponding virtual level address entry. An aspect further includes storing a single instance of the common sectors at the physical level.

    DATA REDUCTION BY REPLACEMENT OF REPEATING PATTERN WITH SINGLE INSTANCE

    公开(公告)号:US20210132836A1

    公开(公告)日:2021-05-06

    申请号:US16669172

    申请日:2019-10-30

    Abstract: A technique for managing data storage begins at a predetermined offset relative to a chunk of data received for writing, and identifies a span of contiguous regions of the chunk that contain identical data. The technique replaces the span of contiguous regions of the chunk with a single instance of a region of the contiguous regions. The technique persistently stores a shortened version of the chunk with the single instance replacing the span of contiguous regions.

Patent Agency Ranking