Techniques for Reducing Data Log Recovery Time and Metadata Write Amplification

    公开(公告)号:US20210311919A1

    公开(公告)日:2021-10-07

    申请号:US16842657

    申请日:2020-04-07

    Applicant: VMware, Inc.

    Abstract: Techniques for reducing data log recovery time and metadata write amplification when checkpointing a data log of a storage object in a distributed storage system are provided. In one set of embodiments, a node of the system can determine whether the data log has reached a first threshold size, where the data log comprises a plurality of data log records, and where each data log record includes data and metadata for a write request directed to the storage object. If the data log has reached the first threshold size, the node can copy, from each of the plurality of data log records, the metadata for the write request to a corresponding metadata log entry in a metadata log of the storage object. The node can then truncate the data log by removing the plurality of data log records.

    Global deduplication on distributed storage using segment usage tables

    公开(公告)号:US11093464B1

    公开(公告)日:2021-08-17

    申请号:US16857574

    申请日:2020-04-24

    Applicant: VMware, Inc.

    Abstract: Solutions are disclosed for blocks in a multi-writer log-structured file system. Solutions include selecting candidate segments in a storage medium; reading blocks of the candidate segments; determining whether any blocks are duplicates; updating a reference count for the duplicate blocks; identifying unique blocks; writing at least a portion of the unique blocks to a log; determining whether the log has accumulated a full segment of data; based at least on determining that the log has accumulated a full segment of data, writing the full segment to the storage medium; updating a segment usage table (SUT) to mark the candidate segments as free; and updating the SUT to mark a segment of the storage medium as no longer free. Some examples identify a window start time and stop time, because older segments have been deduped and younger segments may be volatile. Some examples adjust the window to improve performance.

    Enhanced data encryption in distributed datastores using random tweaks stored in data blocks

    公开(公告)号:US11573711B2

    公开(公告)日:2023-02-07

    申请号:US16827692

    申请日:2020-03-23

    Applicant: VMware, Inc.

    Abstract: A method for encrypting data in one or more data blocks is provided. The method receives a first data block to be written to a physical storage that includes one or more physical disks. The method applies a first random tweak to data indicative of the first data block to generate a first encrypted data block, and writes the first encrypted data block and the first random tweak to a first physical block of the physical storage. The method receives a second data block to be written to the physical storage. The method then applies a second random tweak, different than the first random tweak, to data indicative of the second data block to generate a second encrypted data block, and writes the second encrypted data block and the second random tweak to a second physical block of the physical storage.

    Supporting deduplication in file storage using file chunk hashes

    公开(公告)号:US11500819B2

    公开(公告)日:2022-11-15

    申请号:US17028405

    申请日:2020-09-22

    Applicant: VMware, Inc.

    Abstract: The present disclosure is related to methods, systems, and machine-readable media for supporting deduplication in file storage using file chunk hashes. A hash of a chunk of a log segment can be received from a software defined data center. A chunk identifier can be associated with the hash in a hash map that stores associations between sequentially-allocated chunk identifiers and hashes. The chunk identifier can be associated with a logical address corresponding to the chunk of the log segment in a logical map that stores associations between the sequentially-allocated chunk identifiers and logical addresses. A search of the hash map can be performed to determine if the chunk is a duplicate, and the chunk can be deduplicated responsive to a determination that the chunk is a duplicate.

    System and methods of efficiently resyncing failed components without bitmap in an erasure-coded distributed object with log-structured disk layout

    公开(公告)号:US11429498B2

    公开(公告)日:2022-08-30

    申请号:US16870861

    申请日:2020-05-08

    Applicant: VMware, Inc.

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for resynchronizing data in a storage system. One of the methods includes determining that a particular disk of a capacity object of a storage system was offline for an interval of time, wherein the capacity object comprises a plurality of segments, and wherein the storage system comprises a segment usage table identifying a linked list of particular segments of the capacity object that are in use; determining a time point at which the particular disk went offline; determining one or more first segments of the capacity object that were modified after the time point, wherein determining one or more first segments comprises determining each segment of the segment usage table having a transaction ID that is larger than the time point; and resynchronizing, for each first segment, a portion of the particular disk corresponding to the first segment.

    SUPPORTING DEDUPLICATION IN OBJECT STORAGE USING SUBSET HASHES

    公开(公告)号:US20220091765A1

    公开(公告)日:2022-03-24

    申请号:US17028312

    申请日:2020-09-22

    Applicant: VMware, Inc.

    Abstract: The present disclosure is related to methods, systems, and machine-readable media for supporting deduplication in object storage using subset hashes. A plurality of hashes of a plurality of blocks of a plurality of log segments can be received from a software defined data center, wherein each block corresponds to a respective logical address. Each of the plurality of logical addresses can be associated with a respective sequentially-allocated chunk identifier in a logical map. A subset hash comprising a hash of a subset of the plurality of blocks can be determined that corresponds to a contiguous range of the plurality of logical addresses. A search of a hash map for the subset hash can be performed to determine if the subset hash is a duplicate. The subset of the plurality of blocks can be deduplicated responsive to a determination that the subset hash is a duplicate.

    Recovering the Metadata of Data Backed Up in Cloud Object Storage

    公开(公告)号:US20220066883A1

    公开(公告)日:2022-03-03

    申请号:US17002669

    申请日:2020-08-25

    Applicant: VMware, Inc.

    Abstract: Techniques for recovering metadata associated with data backed up in cloud object storage are provided. In one set of embodiments, a computer system can create a snapshot of a data set, where the snapshot includes a plurality of data blocks of the data set that have been modified since the creation of a prior snapshot of the data set. The computer system can further upload the snapshot to a cloud object storage platform of a cloud infrastructure, where the snapshot is uploaded as a plurality of log segments conforming to an object format of the cloud object storage platform, and where each log segment includes one or more data blocks in the plurality of data blocks, and a set of metadata comprising, for each of the one or more data blocks, an identifier of the data set, an identifier of the snapshot, and a logical block address (LBA) of the data block. The computer system can then communicate the set of metadata to a server component running in a cloud compute and block storage platform of the cloud infrastructure.

Patent Agency Ranking