Techniques for efficient data deduplication

    公开(公告)号:US11416462B2

    公开(公告)日:2022-08-16

    申请号:US16927257

    申请日:2020-07-13

    发明人: Peng Wu Bin Dai Rong Yu

    摘要: Data deduplication techniques may use a fingerprint hash table and a backend location hash table in connection with performing operations including fingerprint insertion, fingerprint deletion and fingerprint lookup. Processing I/O operations may include: receiving a write operation that writes data to a target logical address; determining a fingerprint for the data; querying the fingerprint hash table using the fingerprint to determine a matching entry of the fingerprint hash table for the fingerprint; and responsive to determining that the fingerprint hash table does not have the matching entry that matches the fingerprint, performing processing including: inserting a first entry in the fingerprint hash table, wherein the first entry includes the fingerprint for the data and identifies a storage location at which the data is stored; and inserting a second entry in a backend location hash table, wherein the second entry references the first entry.

    Method and apparatus for increasing the accuracy of predicting future IO operations on a storage system

    公开(公告)号:US11315028B2

    公开(公告)日:2022-04-26

    申请号:US17010945

    申请日:2020-09-03

    摘要: A method of increasing the accuracy of predicting future IO operations on a storage system includes creating a snapshot of a production volume, linking the snapshot to a thin device, mounting the thin device in a cloud tethering subsystem, and tagging the thin device to identify the thin device as being used by the cloud tethering subsystem. When data read operations are issued by the cloud tethering subsystem on the tagged thin device, the data read operations are executed by a front-end adapter of the storage system to forward data associated with the data read operations to a cloud repository. The cache manager, however, does not use information about data read operations on tagged thin devices in connection with predicting future IO operations on the cache, so that movement of snapshots to the cloud repository do not skew the algorithms being used by the cache manager to perform cache management.

    Reducing overhead of managing cache areas

    公开(公告)号:US10579529B2

    公开(公告)日:2020-03-03

    申请号:US15964315

    申请日:2018-04-27

    摘要: Maintaining multiple cache areas in a storage device having multiple processors includes loading data from a specific portion of non-volatile storage into a local cache slot in response to a specific processor of a first subset of the processors performing a read operation to the specific portion of non-volatile storage, where the local cache slot is accessible to the first subset of the processors and is inaccessible to a second subset of the processors that is different than the first subset of the processors and includes converting the local cache slot into a global cache slot in response to one of the processors performing a write operation to the specific portion of non-volatile storage, wherein the global cache area is accessible to the first subset of the processors and to the second subset of the processors. Different ones of the processors may be placed on different directors.

    TECHNIQUES FOR EFFICIENT DATA DEDUPLICATION

    公开(公告)号:US20220358103A1

    公开(公告)日:2022-11-10

    申请号:US17864717

    申请日:2022-07-14

    发明人: Peng Wu Bin Dai Rong Yu

    摘要: Data deduplication techniques may use a fingerprint hash table and a backend location hash table in connection with performing operations including fingerprint insertion, fingerprint deletion and fingerprint lookup. Processing I/O operations may include: receiving a write operation that writes data to a target logical address; determining a fingerprint for the data; querying the fingerprint hash table using the fingerprint to determine a matching entry of the fingerprint hash table for the fingerprint; and responsive to determining that the fingerprint hash table does not have the matching entry that matches the fingerprint, performing processing including: inserting a first entry in the fingerprint hash table, wherein the first entry includes the fingerprint for the data and identifies a storage location at which the data is stored; and inserting a second entry in a backend location hash table, wherein the second entry references the first entry.

    Greedy packing algorithm with caching and ranking

    公开(公告)号:US11340805B1

    公开(公告)日:2022-05-24

    申请号:US17157204

    申请日:2021-01-25

    IPC分类号: G06F3/06

    摘要: A storage array packs multiple non-full-size front-end tracks into slices that contain multiple back-end tracks. A greedy first fit packing algorithm is used to find packing solutions that are cached and ranked. The cached, ranked packing solutions are used by attempting to find matches with bucketed front-end tracks to be relocated. New packing solutions are generated and cached when matches cannot be found. Packing solutions may be shared outside the domain in which they are discovered.

    Compressed data verification
    7.
    发明授权

    公开(公告)号:US11243890B2

    公开(公告)日:2022-02-08

    申请号:US16742201

    申请日:2020-01-14

    发明人: Peng Wu Rong Yu Tao Gong

    IPC分类号: G06F12/0897

    摘要: Embodiments of the present disclosure relate to verifying compressed data. Compressed data files can be read from a global cache for a storage device into a local buffer. A data verification level of a plurality of data verification levels can be selected to perform on the compressed data files. An amount of data blocks of each data file can be decompressed based on the determined data verification level. An integrity of the compressed data files verified using the decompressed data blocks.

    TECHNIQUES FOR EFFICIENT DATA DEDUPLICATION

    公开(公告)号:US20220012218A1

    公开(公告)日:2022-01-13

    申请号:US16927257

    申请日:2020-07-13

    发明人: Peng Wu Bin Dai Rong Yu

    摘要: Data deduplication techniques may use a fingerprint hash table and a backend location hash table in connection with performing operations including fingerprint insertion, fingerprint deletion and fingerprint lookup. Processing I/O operations may include: receiving a write operation that writes data to a target logical address; determining a fingerprint for the data; querying the fingerprint hash table using the fingerprint to determine a matching entry of the fingerprint hash table for the fingerprint; and responsive to determining that the fingerprint hash table does not have the matching entry that matches the fingerprint, performing processing including: inserting a first entry in the fingerprint hash table, wherein the first entry includes the fingerprint for the data and identifies a storage location at which the data is stored; and inserting a second entry in a backend location hash table, wherein the second entry references the first entry.

    THROTTLING PROCESSING THREADS
    9.
    发明申请

    公开(公告)号:US20210311852A1

    公开(公告)日:2021-10-07

    申请号:US16838079

    申请日:2020-04-02

    IPC分类号: G06F11/34 G06N20/00 G06F9/50

    摘要: Embodiments of the present disclosure relate to throttling processing threads of a storage device. One or more input/output (I/O) workloads of a storage device can be monitored. One or more resources consumed by each thread of each storage device component to process each operation included in a workload can be analyzed. Based on the analysis, consumption of each resource consumed by each thread can be controlled.

    COMPRESSED DATA VERIFICATION
    10.
    发明申请

    公开(公告)号:US20210216468A1

    公开(公告)日:2021-07-15

    申请号:US16742201

    申请日:2020-01-14

    发明人: Peng Wu Rong Yu Tao Gong

    IPC分类号: G06F12/0897

    摘要: Embodiments of the present disclosure relate to verifying compressed data. Compressed data files can be read from a global cache for a storage device into a local buffer. A data verification level of a plurality of data verification levels can be selected to perform on the compressed data files. An amount of data blocks of each data file can be decompressed based on the determined data verification level. An integrity of the compressed data files verified using the decompressed data blocks.