-
公开(公告)号:US11748307B2
公开(公告)日:2023-09-05
申请号:US17500246
申请日:2021-10-13
Applicant: EMC IP Holding Company LLC
Inventor: Uri Shabi , Alexei Kabishcer , Jonathan Volij
IPC: G06F16/00 , G06F16/174 , H03M7/30 , G06F16/13
CPC classification number: G06F16/1744 , G06F16/137 , H03M7/3064
Abstract: Technology is disclosed for selectively compressing data based on similarity of pages within the data that is to be compressed. At least one corresponding hash value is generated for each one of multiple candidate pages to be compressed. In response to the hash values generated for the candidate pages, the technology selects a set of similar candidate pages from the candidate pages. The set of similar candidate pages are a subset of the candidate pages that includes less than all the candidate pages. The set of similar candidate pages are compressed as a single unit, separately from one or more other ones of the candidate pages that were not selected to be included in the set of similar candidate pages.
-
公开(公告)号:US11481131B2
公开(公告)日:2022-10-25
申请号:US17110672
申请日:2020-12-03
Applicant: EMC IP Holding Company LLC
Inventor: Shaul Dar , Uri Shabi , Ronen Gazit
Abstract: Determining and using deduplication estimates may include: determining two deduplication sample indexes (DSIs) for two logical device sets each including one or more logical devices, determining a Jaccard Similarity for the two DSIs, wherein the Jaccard Similarity denotes a measurement of similarity and mutual deduplication between the two logical device sets; determining, in accordance with one or more criteria, whether the two logical device sets should be located in different data storage systems or a same data storage system that performs data deduplication, wherein the one or more criteria uses the Jaccard Similarity in determining whether to locate the two logical device sets in the same data storage system or the different data storage systems; and responsive to determining that the two logical device sets should be located in the same data storage system, locating the two logical device sets in the same data storage system.
-
公开(公告)号:US11243930B2
公开(公告)日:2022-02-08
申请号:US16527468
申请日:2019-07-31
Applicant: EMC IP Holding Company, LLC
Inventor: Bar Harel , Uri Shabi , Maor Rahamim
Abstract: A method, computer program product, and computer system for storing data in a bucket of a plurality of buckets. A spare bucket may be reserved in the plurality of buckets. A copy of the data may be stored in the spare bucket. A pointer to the data in the bucket and a pointer to the copy of the data in the spare bucket may be updated based upon, at least in part, storing the data in the bucket and storing the copy of the data in the spare bucket.
-
公开(公告)号:US20220035734A1
公开(公告)日:2022-02-03
申请号:US16940952
申请日:2020-07-28
Applicant: EMC IP Holding Company, LLC
Inventor: Bar Harel , Maor Rahamim , Uri Shabi
IPC: G06F12/02 , G06F12/0897
Abstract: A method, computer program product, and computer system for identifying, by a computing device, content in a first bucket in a first cache. It may be determined that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket may be unique. The first portion of the content in the first bucket may be deduplicated from the first cache. The second portion of the content may be stored in a second bucket in a second cache.
-
公开(公告)号:US11210231B2
公开(公告)日:2021-12-28
申请号:US16665328
申请日:2019-10-28
Applicant: EMC IP Holding Company LLC
Inventor: Anton Kucherov , Ronen Gazit , Vladimir Shveidel , Uri Shabi
IPC: G06F12/0893
Abstract: Techniques for performing cache management includes partitioning entries of a hash table into buckets, wherein each of the buckets includes a portion of the entries of the hash table, configuring a cache, wherein the configuring includes allocating a section of the cache for exclusive use by each bucket, and performing first processing that stores a data block in the cache. The first processing includes determining a hash value for a data block, selecting, in accordance with the hash value, a first bucket of the plurality of buckets, wherein a first section of the cache is used exclusively for storing cached data blocks of the first bucket, storing metadata used in connection with caching the data block in a first entry of the first bucket, and storing the data block in a first cache location of the first section of the cache.
-
公开(公告)号:US11151056B2
公开(公告)日:2021-10-19
申请号:US16394632
申请日:2019-04-25
Applicant: EMC IP Holding Company LLC
Inventor: Vladimir Shveidel , Uri Shabi , Ronen Gazit
IPC: G06F12/109
Abstract: Techniques for providing an efficient virtualization layer structure in a data storage system. The techniques include implementing multiple layers of indirection for accessing host data in the data storage system, including a mapping layer, a virtualization layer, and a physical layer. The virtualization layer includes virtual layer blocks (VLBs), each VLB including virtual pointers. Each virtual pointer is pointed to by one or more leaf pointers in the mapping layer, and points to a data block in the physical layer. The techniques include generating, for each virtual pointer in the VLB, reference count metadata to keep track of the number of leaf pointers pointing to the virtual pointer, and maintaining, in a metadata page, the reference count metadata for the virtual pointers in a three (3)-way mirror. The techniques include maintaining each VLB of the virtualization layer in a RAID stripe across multiple physical drives in the data storage system.
-
公开(公告)号:US11099756B2
公开(公告)日:2021-08-24
申请号:US16654287
申请日:2019-10-16
Applicant: EMC IP HOLDING COMPANY LLC
Inventor: Uri Shabi , Amitai Alkalay
Abstract: An aspect of managing data block compression in a storage system includes performing, for each block written to the storage system: bit-wise traversing the block, searching the block for a pattern indicating a repeating sequence of bits and, upon determining the pattern exists in the block and the repeating sequence of bits in the pattern exceeds a threshold value, removing the repeating sequence of bits from the block thereby yielding a reduced-size block.
-
公开(公告)号:US20210133164A1
公开(公告)日:2021-05-06
申请号:US16668523
申请日:2019-10-30
Applicant: EMC IP Holding Company LLC
Inventor: Uri Shabi , Ronen Gazit , Alon Titelman , Alex Soukhman
IPC: G06F16/215 , G06F16/22
Abstract: A technique for performing deduplication traverses a deduplication database and assigns digest values in the database to buckets, where each bucket covers a respective range of digest values. To deduplicate a particular candidate block, the technique generates a digest from the candidate block and searches for the computed digest in a subset of the buckets, where the subset is selected based on the computed digest. If a target block providing an exact match or a suitably close partial match is found in the subset of buckets, the technique effects storage of the candidate block at least in part by providing a reference to the target block.
-
公开(公告)号:US20210133117A1
公开(公告)日:2021-05-06
申请号:US16668342
申请日:2019-10-30
Applicant: EMC IP Holding Company LLC
Inventor: Uri Shabi , Alex Soukhman
IPC: G06F12/126 , G06F12/1018
Abstract: An aspect includes providing a metadata structure having a logical level that points to a virtual level and a physical level to which the virtual level points. The method also includes storing, at the virtual level, a reference counter for each of a plurality of virtual-level type storage address entries in the metadata structure, and providing a pointer in the metadata structure between each pair of a number of pairs of virtual level address entries in which corresponding pages share a set of common sectors. The reference counter tracks a number of instances in which a corresponding pointer points to a corresponding virtual level address entry. An aspect further includes storing a single instance of the common sectors at the physical level.
-
公开(公告)号:US20210132836A1
公开(公告)日:2021-05-06
申请号:US16669172
申请日:2019-10-30
Applicant: EMC IP Holding Company LLC
Inventor: Uri Shabi , Amitai Alkalay
Abstract: A technique for managing data storage begins at a predetermined offset relative to a chunk of data received for writing, and identifies a span of contiguous regions of the chunk that contain identical data. The technique replaces the span of contiguous regions of the chunk with a single instance of a region of the contiguous regions. The technique persistently stores a shortened version of the chunk with the single instance replacing the span of contiguous regions.