Deduplication ratio estimation using an expandable basis set

    公开(公告)号:US10740296B2

    公开(公告)日:2020-08-11

    申请号:US15600880

    申请日:2017-05-22

    Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.

    IDENTIFICATION OF HIGH DEDUPLICATION DATA

    公开(公告)号:US20180150474A1

    公开(公告)日:2018-05-31

    申请号:US15678449

    申请日:2017-08-16

    CPC classification number: G06F17/30159 G06F17/30156

    Abstract: A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.

    Optimizing dual-layered compression in storage systems

    公开(公告)号:US10831412B2

    公开(公告)日:2020-11-10

    申请号:US16743762

    申请日:2020-01-15

    Abstract: Embodiments for optimizing dual-layered data compression in a storage environment. In a data storage system having a primary compressor implemented in a storage controller and a secondary compressor implemented within a drive-enclosure, the primary compressor is selectively used to perform a first one of a plurality of actions on input/output (I/O) data while a second one of the plurality of actions is performed on the I/O data by the secondary compressor, thereby reducing latency and improving an overall compression performance while processing the I/O data.

    Adaptive garbage collection (GC) utilization for grid storage systems

    公开(公告)号:US10824556B2

    公开(公告)日:2020-11-03

    申请号:US16181179

    申请日:2018-11-05

    Abstract: A computer-implemented method according to one embodiment includes determining resource usage of at least a first module in a grid storage system having multiple modules and approximately equal resource usage across the multiple modules of the grid storage system. The computer-implemented method further includes determining a garbage collection cost in the grid storage system by stopping garbage collection in a second of the modules of the grid storage system, determining a resource usage in the second module upon stopping the garbage collection, and comparing the resource usage in the second module to the resource usage of the at least the first module. The method further includes adjusting an amount of garbage collection based on both the garbage collection cost and the resource usage.

    Deduplication ratio estimation using an expandable basis set

    公开(公告)号:US10747726B2

    公开(公告)日:2020-08-18

    申请号:US15063550

    申请日:2016-03-08

    Abstract: A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.

    Reducing decryption latency for encryption processing

    公开(公告)号:US09864863B2

    公开(公告)日:2018-01-09

    申请号:US14252344

    申请日:2014-04-14

    CPC classification number: G06F21/602 G09C1/00 H04L2209/125

    Abstract: In a compression processing storage system, using a pool of encryption processing cores, the encryption processing cores are assigned to process either encryption operations, decryption operations, and decryption and encryption operations, that are scheduled for processing. A maximum number of the encryption processing cores are set for processing only the decryption operations, thereby lowering a decryption latency. A minimal number of the encryption processing cores are allocated for processing the encryption operations, thereby increasing encryption latency. Upon reaching a throughput limit for the encryption operations that causes the minimal number of the plurality of encryption processing cores to reach a busy status, the minimal number of the plurality of encryption processing cores for processing the encryption operations is increased.

Patent Agency Ranking