LARGE CONTENT FILE OPTIMIZATION
    21.
    发明申请

    公开(公告)号:US20200004852A1

    公开(公告)日:2020-01-02

    申请号:US16024107

    申请日:2018-06-29

    申请人: Cohesity, Inc.

    IPC分类号: G06F17/30 G06F11/14

    摘要: A size associated with a content file is determined to be greater than a threshold size. In response to the determination, file metadata of the content file split and stored across a plurality of component file metadata structures. The file metadata of the content file specifies tree structure organizing data components of the content file and each component file metadata structure of the plurality of component file metadata structures stores a portion of the tree structure. A snapshot tree is updated to reference the plurality of component file metadata structures for the content file.

    Deduplicating metadata based on a common sequence of chunk identifiers

    公开(公告)号:US12032537B2

    公开(公告)日:2024-07-09

    申请号:US17215865

    申请日:2021-03-29

    申请人: Cohesity, Inc.

    发明人: Zhihuan Qiu Yu Liu

    IPC分类号: G06F16/215

    CPC分类号: G06F16/215

    摘要: A first group of chunk identifiers associated with a first content identifier structure of a first metadata element and a second group of chunk identifiers associated with a second content identifier structure of a second metadata element are determined. A common sequence of chunk identifiers across at least a portion of the first group of chunk identifiers associated with the first content identifier structure and the second group of chunk identifiers associated with the second content identifier structure is determined. A portion of the first group of chunk identifiers associated with the first content identifier structure and a portion of the second group of chunk identifiers associated with the second content identifier structure is updated to reference a common sequence identifier in place of the determined common sequence of chunk identifiers.

    Incremental virtual machine metadata extraction

    公开(公告)号:US11782886B2

    公开(公告)日:2023-10-10

    申请号:US17489536

    申请日:2021-09-29

    申请人: Cohesity, Inc.

    IPC分类号: G06F16/188 G06F16/11

    CPC分类号: G06F16/188 G06F16/128

    摘要: A virtual machine container file is analyzed to determine which portion of the virtual machine container file corresponds to a virtual machine file system metadata of the virtual machine container file. One or more differences between a first version of a virtual machine container file and a second version of the virtual machine container file are determined at least in part by traversing a snapshot structure associated with the virtual machine container file. The determined one or more differences that corresponds to the virtual machine file system metadata portion of the virtual machine container file are identified based at least in part on the analysis of the virtual machine container file. The identified one or more differences corresponding to the virtual machine file system metadata portion of the virtual machine file are utilized to identify one or more changes from the content files included in the first version of the virtual machine container file to content files included in the second version of the virtual machine container file.

    Large content file optimization
    25.
    发明授权

    公开(公告)号:US11693741B2

    公开(公告)日:2023-07-04

    申请号:US17348401

    申请日:2021-06-15

    申请人: Cohesity, Inc.

    摘要: A size associated with a content file is determined to be greater than a threshold size. Contents of the content file split across a plurality of component files are stored. Metadata, for the content file, is updated to reference a plurality of component file metadata structures for the component files. A node of the metadata is configured to track different sizes of portions of the content file stored in different component files of the plurality of component files. File metadata of the content file is split across the plurality of component file metadata structures and each component file metadata structure of the plurality of component file metadata structures specifies a corresponding structure organizing data components for a corresponding portion of the content file.

    ADAPTIVELY PROVIDING UNCOMPRESSED AND COMPRESSED DATA CHUNKS

    公开(公告)号:US20230177011A1

    公开(公告)日:2023-06-08

    申请号:US17545655

    申请日:2021-12-08

    申请人: Cohesity, Inc.

    IPC分类号: G06F16/174

    CPC分类号: G06F16/1744

    摘要: A selected data chunk associated with an object is determined to be sent to a destination. A chunk compression grouping storing the selected data chunk associated with the object is identified. The identified chunk compression grouping includes a plurality of data chunks compressed together. A data content version that includes the selected data chunk associated with the object to be provided to the destination is determined from a plurality of data content versions based at least in part a metric associated with the identified chunk compression grouping.

    Large content file optimization
    27.
    发明授权

    公开(公告)号:US11074135B2

    公开(公告)日:2021-07-27

    申请号:US16688653

    申请日:2019-11-19

    申请人: Cohesity, Inc.

    摘要: A size associated with a content file is determined to be greater than a threshold size. Contents of the content file split across a plurality of component files are stored. Metadata, for the content file, is updated to reference a plurality of component file metadata structures for the component files. A node of the metadata is configured to track different sizes of portions of the content file stored in different component files of the plurality of component files. File metadata of the content file is split across the plurality of component file metadata structures and each component file metadata structure of the plurality of component file metadata structures specifies a corresponding structure organizing data components for a corresponding portion of the content file.

    FILE SYSTEM METADATA DEDUPLICATION
    28.
    发明申请

    公开(公告)号:US20200349115A1

    公开(公告)日:2020-11-05

    申请号:US16854153

    申请日:2020-04-21

    申请人: Cohesity, Inc.

    摘要: File metadata structures of a file system are analyzed. At least one metadata element that is duplicated among the analyzed file metadata structures is identified. The at least one identified metadata element is deduplicated including by modifying at least one of the file metadata structures to reference a same instance of the identified metadata element that is referenced by another one of the file metadata structures.

    Partial in-line deduplication and partial post-processing deduplication of data chunks

    公开(公告)号:US11947497B2

    公开(公告)日:2024-04-02

    申请号:US17410745

    申请日:2021-08-24

    申请人: Cohesity, Inc.

    发明人: Zhihuan Qiu Yu Liu

    IPC分类号: G06F7/00 G06F16/174

    CPC分类号: G06F16/1752

    摘要: Data is ingested from a source system. Ingesting the data includes determining corresponding chunk identifiers for a plurality of data chunks corresponding to the ingested data and for each of the plurality of data chunks, verifying whether the corresponding chunk identifier is included in a data structure tracking identifiers of data chunks that were already stored in a storage of a storage system before the data ingestion started and storing the data chunk in a storage based on the verification. After the ingesting is completed, deduplication of the ingested data chunks stored in the storage having a same chunk identifier is performed and the data structure is updated based on the deduplication.

    ADAPTIVE SOURCE SIDE DEDUPLICATION
    30.
    发明公开

    公开(公告)号:US20240004763A1

    公开(公告)日:2024-01-04

    申请号:US17852867

    申请日:2022-06-29

    申请人: Cohesity, Inc.

    IPC分类号: G06F11/14 G06F16/174

    摘要: A backup of one or more objects is determined to be performed. Based on one or more conditions, a corresponding deduplication option among a plurality of deduplication options to utilize when backing up the one or more objects is selected. The one or more conditions at least include a condition based on a detected data change pattern. The plurality of deduplication options include a deduplication option associated with utilizing at least in part a plurality of variable-length data chunks for one or more mismatched ranges and/or one or more missing ranges associated with one of the one or more objects associated with the source system. A request to perform the backup of the one or more objects according to the corresponding selected deduplication option is provided to the source system. Backup data associated with the one or more objects is received and stored.