SMALL IN-MEMORY CACHE TO SPEED UP CHUNK STORE OPERATION FOR DEDUPLICATION

    公开(公告)号:US20210064581A1

    公开(公告)日:2021-03-04

    申请号:US16552976

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: The present disclosure provides techniques for deduplicating files. The techniques include creating a cache or subset of a large data structure. The large data structure organizes information by random hash values. The random hash values result in a random organization of information within the data structure, with the information spanning a large number of storage blocks within a storage system. The cache, however, is within memory and is small relative to the data structure. The cache is created so as to contain information that is likely to be needed during deduplication of a file. Having needed information within memory rather than in storage results in faster read and write operations to that information, improving the performance of a computing system.

    PROBABILISTIC ALGORITHM TO CHECK WHETHER A FILE IS UNIQUE FOR DEDUPLICATION

    公开(公告)号:US20210064579A1

    公开(公告)日:2021-03-04

    申请号:US16552908

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: Disclosed techniques include deduplication. Techniques include determining whether a file is unique, and depending on whether the file is unique, deduplicating only part of the file or the entire file. The techniques include processing the first chunk of a file to determine whether the hash of the chunk hash is already within a chunk hash table, and if not, then a percentage of chunks of the file is similarly processed. If any of the hashes of chunks are already in the chunk hash table, then at least some of file has been previously deduplicated, and file is not unique the storage system. If none of the processed chunks have a hash that is already in the chunk hash table, then the file is considered to be unique within chunk store and only a partial percentage of the file's chunks are deduplicated. Not all of a unique file's chunks are deduplicated.

    EFFICIENT GARBAGE COLLECTION OF VARIABLE SIZE CHUNKING DEDUPLICATION

    公开(公告)号:US20210064522A1

    公开(公告)日:2021-03-04

    申请号:US16552954

    申请日:2019-08-27

    Applicant: VMware, Inc.

    Abstract: The present disclosure provides techniques for deallocating previously allocated storage blocks. The techniques include obtaining a list of chunk IDs to analyze, choosing a chunk ID, and determining the storage blocks spanned by the chunk corresponding to the chosen chunk ID. The technique further includes determining whether any file references any storage blocks spanned by the chunk. The determining may be performed by comparing an internal reference count to a total reference count, where the internal reference count is the number of reference to the storage block by a chunk ID data structure. If no files reference any of the storage blocks spanned by the chunk, then all the storage blocks of the chunk can be deallocated.

    BULK-LOAD FOR B-TREES
    25.
    发明申请

    公开(公告)号:US20200293506A1

    公开(公告)日:2020-09-17

    申请号:US16353535

    申请日:2019-03-14

    Applicant: VMware, Inc.

    Abstract: Embodiments described herein are related to bulk loading data into a B-tree. Embodiments include generating a first leaf node of a B-tree by allocating a first page for the first leaf node from a leaf page queue comprising a first plurality of sequential pages; and writing one or more tuples to the first page allocated for the first leaf node. Embodiments further include generating an parent node for the first leaf node and a second leaf node of the B-tree by allocating a third page for the parent node from an parent page queue comprising a second plurality of sequential pages, the parent node comprising a first indication of the first leaf node and a second indication of the second leaf node, the first indication and the second indication stored in the third page allocated for the parent.

    OPTIMAL SNAPSHOT DELETION
    26.
    发明申请

    公开(公告)号:US20190311047A1

    公开(公告)日:2019-10-10

    申请号:US15947072

    申请日:2018-04-06

    Applicant: VMware, Inc.

    Abstract: Embodiments described herein involve improved management of snapshots of a file system. Embodiments include copying a first root node of a first snapshot to a second snapshot, the second snapshot referencing other nodes of the first snapshot. Embodiments further include incrementing reference counts of the other nodes of the first snapshot. Embodiments further include adding a storage address of the first root node to a list. Embodiments further include, each time that a copy on write operation is performed for a node of the other nodes, adding a storage address of the node to the list and decrementing the reference count of the node. Embodiments further include iterating through the list and, for each storage address in the list, decrementing the reference count of the node corresponding to the storage address and, if the reference count of the node reaches zero, freeing storage space at the storage address.

Patent Agency Ranking