DATA DEDUPLICATION USING TRUNCATED FINGERPRINTS

    公开(公告)号:US20190340262A1

    公开(公告)日:2019-11-07

    申请号:US15968825

    申请日:2018-05-02

    Abstract: The system, devices, and methods disclosed herein relate to data ratio reduction technology adapted to reduce storage costs by weeding out duplicative data write operations. The techniques and systems disclosed achieve deduplication benefits by reducing the size of hash values stored hash tables used to compare unwritten data blocks to data that has already been written and stored somewhere in physical storage. The data deduplication systems, methods, and products facilitate deduplication at the block level as well as for misaligned data chunks within data blocks, that is an unwritten data block that has been stored sequentially in two different physical locations. The deduplication teachings herein are amenable to varying data block sizes as well as data chunk sizes within blocks. Our embodiments enhance computer performance by substantially reducing computational speeds and storage requirements attendant to deduplication systems using larger hash table data sizes.

    Techniques for efficient data deduplication

    公开(公告)号:US11803527B2

    公开(公告)日:2023-10-31

    申请号:US17864717

    申请日:2022-07-14

    Inventor: Peng Wu Bin Dai Rong Yu

    CPC classification number: G06F16/215 G06F16/174 G06F16/2255 G06F16/245

    Abstract: Data deduplication techniques may use a fingerprint hash table and a backend location hash table in connection with performing operations including fingerprint insertion, fingerprint deletion and fingerprint lookup. Processing I/O operations may include: receiving a write operation that writes data to a target logical address; determining a fingerprint for the data; querying the fingerprint hash table using the fingerprint to determine a matching entry of the fingerprint hash table for the fingerprint; and responsive to determining that the fingerprint hash table does not have the matching entry that matches the fingerprint, performing processing including: inserting a first entry in the fingerprint hash table, wherein the first entry includes the fingerprint for the data and identifies a storage location at which the data is stored; and inserting a second entry in a backend location hash table, wherein the second entry references the first entry.

    INTELLIGENTLY MANAGING DATA FACILITY CACHES
    25.
    发明申请

    公开(公告)号:US20200320002A1

    公开(公告)日:2020-10-08

    申请号:US16375545

    申请日:2019-04-04

    Abstract: Architectures and techniques are described that can address challenges associated with efficiently managing a cache of a data facility. In that regard, for each block (or other file system structure) of a storage array spanning multiple storage device, relationships can be established between other blocks of the array. The blocks can then be represented as multidimensional vectors, and an aggregation of the vectors can be represented as a weight matrix having values that reflect the corresponding relationships between any two given blocks. In response to any given IO transaction, a corresponding vector can be selected that is representative of a block referenced by the IO transaction and one or more target blocks having a high relationship value to the block can be identified and used in connection with a cache update procedure.

    Data fingerprint distribution on a data storage system

    公开(公告)号:US10782882B1

    公开(公告)日:2020-09-22

    申请号:US16387997

    申请日:2019-04-18

    Inventor: Peng Wu Bin Dai Rong Yu

    Abstract: Fingerprints of data portions are distributed in a balanced manner across active controllers of a data storage system, and may be done so in such a manner that, when a new active controller is added to the system, fingerprint ownership and movement between pre-existing active controllers, and active controllers overall, is minimized When a new active controller is added to the system and fingerprints are redistributed, no fingerprint ownership may be re-assigned between pre-existing active controllers and no fingerprints may be moved between pre-existing active controllers, for example, between local memories of the active controller.

    Synchronous destage of write data from shared global memory to back-end storage resources

    公开(公告)号:US11573738B2

    公开(公告)日:2023-02-07

    申请号:US17151794

    申请日:2021-01-19

    Abstract: A synchronous destage process is used to move data from shared global memory to back-end storage resources. The synchronous destage process is implemented using a client-server model between a data service layer (client) and back-end disk array of a storage system (server). The data service layer initiates a synchronous destage operation by requesting that the back-end disk array move data from one or more slots of global memory to back-end storage resources. The back-end disk array services the request and notifies the data service layer of the status of the destage operation, e.g. a destage success or destage failure. If the destage operation is a success, the data service layer updates metadata to identify the location of the data on back-end storage resources. If the destage operation is not successful, the data service layer re-initiates the destage process by issuing a subsequent destage request to the back-end disk array.

Patent Agency Ranking