Method and apparatus for determining optimal chunk sizes of a deduplicated storage system
    2.
    发明授权
    Method and apparatus for determining optimal chunk sizes of a deduplicated storage system 有权
    用于确定重复数据消除的存储系统的最佳块大小的方法和装置

    公开(公告)号:US08639669B1

    公开(公告)日:2014-01-28

    申请号:US13334732

    申请日:2011-12-22

    Abstract: Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.

    Abstract translation: 这里描述了用于评估存储系统中的数据块的重复数据删除效率的技术。 在一个实施例中,检查与重复数据消除的存储系统相关联的第一数据块的元数据,其中根据第一块大小对第一数据块进行了分区。 基于对第一数据块的元数据的检查来计算第二块大小。 根据第二块大小将第一数据块的元数据合并以表示第一数据块将被合并到的第二数据块。 基于合并的元数据来确定第二数据块的重复数据删除率。

    Preferential selection of candidates for delta compression
    4.
    发明授权
    Preferential selection of candidates for delta compression 有权
    优先选择三角洲压缩的候选人

    公开(公告)号:US08712978B1

    公开(公告)日:2014-04-29

    申请号:US13495868

    申请日:2012-06-13

    CPC classification number: G06F17/30153 H03M7/3091

    Abstract: A computer-implemented method and system for improving efficiency in a delta compression process in a data storage system selects a data chunk to delta compress and selects a set of candidate data chunks using a first selection mechanism. Throughput or resource utilization is monitored. A change is made to a second selection mechanism that increases similarity of the set of candidates with the selected data chunk to improve compression in response to determining high resource availability or high throughput level. A change is made to a third selection mechanism that increases throughput of the delta compression process in response to determining low resources availability or low throughput.

    Abstract translation: 用于提高数据存储系统中的增量压缩处理的效率的计算机实现的方法和系统选择数据块进行增量压缩,并使用第一选择机制选择一组候选数据块。 监测吞吐量或资源利用率。 改变第二选择机制,其增加候选组与所选择的数据块的相似性,以响应于确定高资源可用性或高吞吐量水平来改善压缩。 改变了第三选择机制,其响应于确定低资源可用性或低吞吐量而增加了增量压缩过程的吞吐量。

    Synthetic data generation for backups of block-based storage
    5.
    发明授权
    Synthetic data generation for backups of block-based storage 有权
    基于块的存储的备份的合成数据生成

    公开(公告)号:US09128823B1

    公开(公告)日:2015-09-08

    申请号:US13612393

    申请日:2012-09-12

    Abstract: A system and method for generating synthetic data to simulate backing up data between a primary storage system and a protection storage system is presented. In one embodiment, a first track in a set of tracks is selected at random. Having selected a first track, at least a first block in the first track is modified. Subsequently, it is determined, based on a track run probability, whether to modify a second track that is consecutive to the first track or a third track that is selected randomly. Depending on the determination, at least one block is modified at either the second or third track. Other embodiments are also described herein.

    Abstract translation: 提出了一种用于生成合成数据以模拟主存储系统和保护存储系统之间的数据备份的系统和方法。 在一个实施例中,随机地选择一组轨道中的第一轨道。 在选择第一轨道之后,修改第一轨道中的至少第一块。 随后,基于轨道运行概率来确定是否修改与第一轨道连续的第二轨道或随机选择的第三轨道。 根据确定,在第二或第三轨道处修改至少一个块。 本文还描述了其它实施例。

    Method for cleaning a delta storage system
    6.
    发明授权
    Method for cleaning a delta storage system 有权
    清洁增量存储系统的方法

    公开(公告)号:US08972672B1

    公开(公告)日:2015-03-03

    申请号:US13495893

    申请日:2012-06-13

    Abstract: A computer-implemented method and system for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and selects a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in the file metadata corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are then discarded.

    Abstract translation: 用于在增量压缩数据存储系统中执行垃圾收集的计算机实现的方法和系统选择文件配置来遍历以识别实时数据块,并从文件配方中选择块标识符。 块标识符被添加到一组实况数据块中。 对应于块标识符的文件元数据中的增量引用被添加到一组实时数据块。 然后丢弃未被该组活动数据块标识的数据存储系统中的数据块。

    Out-of-core similarity matching
    7.
    发明授权
    Out-of-core similarity matching 有权
    核心外相似度匹配

    公开(公告)号:US08914338B1

    公开(公告)日:2014-12-16

    申请号:US13335416

    申请日:2011-12-22

    Abstract: A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.

    Abstract translation: 一种通过将数据分割为多个数据块并通过对多个块的每个块应用预定算法来生成多个块中的每一个块的代表数据来将数据存储在数据存储系统中的方法。 随后,比较和分类代表数据。 通过评估排序的代表数据集来识别基本数据块的代表数据和可以相对于基本数据块存储的其他数据块的代表数据。 最后,将被识别为能够相对于基本数据块存储的其他数据块中的每一个存储在数据存储系统中作为数据块和基本数据块之间的差异。

    Characterizing and modeling virtual synthetic backup workloads
    8.
    发明授权
    Characterizing and modeling virtual synthetic backup workloads 有权
    表征和建模虚拟合成备份工作负载

    公开(公告)号:US08825653B1

    公开(公告)日:2014-09-02

    申请号:US13616978

    申请日:2012-09-14

    CPC classification number: G06F11/3414 G06F11/1446 G06F17/30097

    Abstract: Embodiments of this invention are directed to a system and method for characterizing and modeling a virtual synthetic file system workload. In one embodiment, a virtual synthetic system is adapted to select a first location in a prior generation dataset of a first cluster and generate a first offset using a distance distribution function. Thereafter, the virtual synthetic system selects a second location in the prior generation dataset of a second cluster, wherein the second location is offset from the first cluster by the first offset. Finally, the virtual synthetic system modifies each cluster selected on the prior generation dataset thereby creating a next generation dataset. This process is repeated to generate multiple generations of a dataset. Other embodiments are also described herein.

    Abstract translation: 本发明的实施例涉及用于表征和建模虚拟合成文件系统工作负载的系统和方法。 在一个实施例中,虚拟合成系统适于选择第一集群的先前生成数据集中的第一位置,并使用距离分布函数生成第一偏移。 此后,虚拟合成系统选择第二集群的先前生成数据集中的第二位置,其中第二位置与第一集群偏移第一偏移。 最后,虚拟合成系统修改在先前生成数据集中选择的每个集群,从而创建下一代数据集。 重复此过程以生成多代数据集。 本文还描述了其它实施例。

    Method for cleaning a delta storage system
    9.
    发明授权
    Method for cleaning a delta storage system 有权
    清洁增量存储系统的方法

    公开(公告)号:US09400610B1

    公开(公告)日:2016-07-26

    申请号:US13495926

    申请日:2012-06-13

    Abstract: A computer-implemented method and system for performing garbage collection in a delta compressed data storage system selects a file recipe to traverse to identify live data chunks and selects a chunk identifier from the file recipe. The chunk identifier is added to a set of live data chunks. Delta references in an entry of an index corresponding to the chunk identifier are added to the set of live data chunks. Data chunks in a data storage system not identified by the set of live data chunks are then discarded.

    Abstract translation: 用于在增量压缩数据存储系统中执行垃圾收集的计算机实现的方法和系统选择文件配置来遍历以识别实时数据块,并从文件配方中选择块标识符。 块标识符被添加到一组实况数据块中。 对应于块标识符的索引的条目中的增量引用被添加到一组活动数据块。 然后丢弃未被该组活动数据块标识的数据存储系统中的数据块。

    Preferential selection of candidates for delta compression
    10.
    发明授权
    Preferential selection of candidates for delta compression 有权
    优先选择三角洲压缩的候选人

    公开(公告)号:US09116902B1

    公开(公告)日:2015-08-25

    申请号:US13495859

    申请日:2012-06-13

    CPC classification number: G06F17/30097 G06F11/14 G06F17/30153

    Abstract: A computer-implemented method and system for improving efficiency in a delta compression process selects a data chunk to delta compress and generates a sketch for the selected data chunk. A set of candidate data chunks with a matching sketch is searched for. The set of candidate data chunks with at least a minimum degree of similarity is ranked by location status data. Tie-breaking of the set of candidate data chunks is done using a degree of sketch similarity for each candidate and the selected data chunk is delta compressed with a selected candidate data chunk.

    Abstract translation: 用于提高增量压缩过程中的效率的计算机实现的方法和系统将数据块选择为增量压缩,并为所选择的数据块生成草图。 搜索一组具有匹配草图的候选数据块。 具有至少最小相似度的候选数据块组被位置状态数据排列。 使用对于每个候选者的一定程度的草图相似度来完成候选数据块集合的断裂,并且所选择的数据块被选定的候选数据块进行增量压缩。

Patent Agency Ranking