-
公开(公告)号:US20220083262A1
公开(公告)日:2022-03-17
申请号:US17457117
申请日:2021-12-01
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Mark David Olson
IPC: G06F3/06
Abstract: A system, method, and machine-readable storage medium for determining an amount of unique data in a distributed storage system are provided. In some embodiments, a combined efficiency set for a first data set stored in the distributed storage system, such as at a volume, may be generated. The first data set may include a first subset of data and a second subset of data in the distributed storage system. Additionally, a set of efficiency sets for the first subset of data may be generated. A set difference based on the combined efficiency set and the set of efficiency sets may be computed. An amount of memory used for storing unique data of the second subset of data may be estimated based on the set difference. The unique data may be present in the second subset of data but absent from the first subset of data.
-
公开(公告)号:US12189981B2
公开(公告)日:2025-01-07
申请号:US18161391
申请日:2023-01-30
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Mark David Olson
IPC: G06F3/06
Abstract: A system, method, and machine-readable storage medium for determining an amount of unique data in a distributed storage system are provided. In some embodiments, a combined efficiency set for a first data set stored in the distributed storage system, such as at a volume, may be generated. The first data set may include a first subset of data and a second subset of data in the distributed storage system. Additionally, a set of efficiency sets for the first subset of data may be generated. A set difference based on the combined efficiency set and the set of efficiency sets may be computed. An amount of memory used for storing unique data of the second subset of data may be estimated based on the set difference. The unique data may be present in the second subset of data but absent from the first subset of data.
-
公开(公告)号:US20230176773A1
公开(公告)日:2023-06-08
申请号:US18161391
申请日:2023-01-30
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Mark David Olson
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/067 , G06F3/0604
Abstract: A system, method, and machine-readable storage medium for determining an amount of unique data in a distributed storage system are provided. In some embodiments, a combined efficiency set for a first data set stored in the distributed storage system, such as at a volume, may be generated. The first data set may include a first subset of data and a second subset of data in the distributed storage system. Additionally, a set of efficiency sets for the first subset of data may be generated. A set difference based on the combined efficiency set and the set of efficiency sets may be computed. An amount of memory used for storing unique data of the second subset of data may be estimated based on the set difference. The unique data may be present in the second subset of data but absent from the first subset of data.
-
公开(公告)号:US12014045B2
公开(公告)日:2024-06-18
申请号:US18057869
申请日:2022-11-22
Applicant: NetApp, Inc.
Inventor: Charles Randall , Alyssa Proulx
CPC classification number: G06F3/0605 , G06F3/0608 , G06F3/0641 , G06F3/0683 , G06F11/3034 , G06F11/324 , G06F11/3409
Abstract: Systems and methods for sampling a set of block IDs to facilitate estimating an amount of data stored in a data set of a storage system having one or more characteristics are provided. According to an example, metadata (e.g., block headers and block IDs) may be maintained regarding multiple data blocks of the data set. When one or more metrics relating to the data set are desired, an efficiency set, representing a subset of the block IDs of the data set, may be created to facilitate efficient calculation of the metrics by sampling the block IDs of the data set. Finally, the metrics may be estimated based on the efficiency set by analyzing one or more of the metadata (e.g., block headers) and the data contained in the data blocks corresponding to the subset of the block IDs and extrapolating the metrics for the entirety of the data set.
-
公开(公告)号:US11194506B1
公开(公告)日:2021-12-07
申请号:US16940461
申请日:2020-07-28
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Mark David Olson
IPC: G06F3/06
Abstract: A system, method, and machine-readable storage medium for determining an amount of unique data in a distributed storage system are provided. In some embodiments, a combined efficiency set for a first data set stored in the distributed storage system, such as at a volume, may be generated. The first data set may include a first subset of data and a second subset of data in the distributed storage system. Additionally, a set of efficiency sets for the first subset of data may be generated. A set difference based on the combined efficiency set and the set of efficiency sets may be computed. An amount of memory used for storing unique data of the second subset of data may be estimated based on the set difference. The unique data may be present in the second subset of data but absent from the first subset of data.
-
公开(公告)号:US20230077764A1
公开(公告)日:2023-03-16
申请号:US18057869
申请日:2022-11-22
Applicant: NetApp, Inc.
Inventor: Charles Randall , Alyssa Proulx
Abstract: Systems and methods for sampling a set of block IDs to facilitate estimating an amount of data stored in a data set of a storage system having one or more characteristics are provided. According to an example, metadata (e.g., block headers and block IDs) may be maintained regarding multiple data blocks of the data set. When one or more metrics relating to the data set are desired, an efficiency set, representing a subset of the block IDs of the data set, may be created to facilitate efficient calculation of the metrics by sampling the block IDs of the data set. Finally, the metrics may be estimated based on the efficiency set by analyzing one or more of the metadata (e.g., block headers) and the data contained in the data blocks corresponding to the subset of the block IDs and extrapolating the metrics for the entirety of the data set.
-
公开(公告)号:US20220129159A1
公开(公告)日:2022-04-28
申请号:US17079249
申请日:2020-10-23
Applicant: NetApp, Inc.
Inventor: Charles Randall , Alyssa Proulx
Abstract: Systems and methods for sampling a set of block IDs to facilitate estimating an amount of data stored in a data set of a storage system having one or more characteristics are provided. According to an example, metadata (e.g., block headers and block IDs) may be maintained regarding multiple data blocks of the data set. When one or more metrics relating to the data set are desired, an efficiency set, representing a subset of the block IDs of the data set, may be created to facilitate efficient calculation of the metrics by statistically sampling the block IDs of the data set. Finally, the metrics may be estimated based on the efficiency set by analyzing one or more of the metadata (e.g., block headers) and the data contained in the data blocks corresponding to the subset of the block IDs and extrapolating the metrics for the entirety of the data set.
-
公开(公告)号:US11288186B2
公开(公告)日:2022-03-29
申请号:US16856228
申请日:2020-04-23
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Wei Sun
IPC: G06F12/02 , G06F12/0864 , G06F11/30
Abstract: A system, method, and machine-readable storage medium for performing garbage collection in a distributed storage system are provided. In some embodiments, an efficiency level of a garbage collection process is monitored. The garbage collection process may include removal of one or more data blocks of a set of data blocks that is referenced by a set of content identifiers. The set of slice services and the set of data blocks may reside in a cluster, and a set of filters may indicate whether the set of data blocks is in-use. At least one parameter of a filter of the set of filters may be adjusted (e.g., increased or reduced) if the efficiency level is below the efficiency threshold. Garbage collection may be performed on the set of data blocks in accordance with the set of filters.
-
公开(公告)号:US20210334208A1
公开(公告)日:2021-10-28
申请号:US16856228
申请日:2020-04-23
Applicant: NetApp, Inc.
Inventor: Alyssa Proulx , Wei Sun
IPC: G06F12/02 , G06F12/0864 , G06F11/30
Abstract: A system, method, and machine-readable storage medium for performing garbage collection in a distributed storage system are provided. In some embodiments, an efficiency level of a garbage collection process is monitored. The garbage collection process may include removal of one or more data blocks of a set of data blocks that is referenced by a set of content identifiers. The set of slice services and the set of data blocks may reside in a cluster, and a set of filters may indicate whether the set of data blocks is in-use. At least one parameter of a filter of the set of filters may be adjusted (e.g., increased or reduced) if the efficiency level is below the efficiency threshold. Garbage collection may be performed on the set of data blocks in accordance with the set of filters.
-
公开(公告)号:US20240319872A1
公开(公告)日:2024-09-26
申请号:US18672641
申请日:2024-05-23
Applicant: NetApp, Inc.
Inventor: Charles Randall , Alyssa Proulx
CPC classification number: G06F3/0605 , G06F3/0608 , G06F3/0641 , G06F3/0683 , G06F11/3034 , G06F11/324 , G06F11/3409
Abstract: Systems and methods for sampling a set of block IDs to facilitate estimating an amount of data stored in a data set of a storage system having one or more characteristics are provided. According to an example, metadata (e.g., block headers and block IDs) may be maintained regarding multiple data blocks of the data set. When one or more metrics relating to the data set are desired, an efficiency set, representing a subset of the block IDs of the data set, may be created to facilitate efficient calculation of the metrics by sampling the block IDs of the data set. Finally, the metrics may be estimated based on the efficiency set by analyzing one or more of the metadata (e.g., block headers) and the data contained in the data blocks corresponding to the subset of the block IDs and extrapolating the metrics for the entirety of the data set.
-
-
-
-
-
-
-
-
-