Efficient content meta-data collection and trace generation from deduplicated storage

    公开(公告)号:US08631052B1

    公开(公告)日:2014-01-14

    申请号:US13335746

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30156

    摘要: The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a full trace of meta-data constructed by matching recipe fingerprints to the corresponding meta-data. The method and system can generate the full meta-data trace efficiently in an on-line or off-line process. Typical deduplicated storage systems achieve 10× or higher deduplication rates, and the meta-data collection is faster than processing all of the original files and produces compact meta-data that is smaller to store.

    Efficient content meta-data collection and trace generation from deduplicated storage
    6.
    发明授权
    Efficient content meta-data collection and trace generation from deduplicated storage 有权
    从重复数据删除的存储中高效内容元数据收集和跟踪生成

    公开(公告)号:US08667032B1

    公开(公告)日:2014-03-04

    申请号:US13335750

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F17/30

    摘要: The method and apparatus collect file recipes from deduplicated data storage systems, the file recipes consist of a list of fingerprints of data chunks of a file. Detailed meta-data for each unique data chunk is also collected. In an offline process, research and analysis can be performed on either the meta-data itself or on a reconstruction of a full trace of meta-data constructed by matching recipe fingerprints to the corresponding meta-data. The method and system can generate the full meta-data trace efficiently in an on-line or off-line process. Typical deduplicated storage systems achieve 10× or higher deduplication rates, and the meta-data collection is faster than processing all of the original files and produces compact meta-data that is smaller to store.

    摘要翻译: 该方法和设备从重复数据删除的数据存储系统收集文件配方,文件配方由文件数据块指纹列表组成。 还收集了每个唯一数据块的详细元数据。 在离线过程中,可以对元数据本身进行研究和分析,也可以对通过将配方指纹与对应的元数据进行匹配而构建的完整的元数据轨迹进行重构。 该方法和系统可以在线或离线过程中有效地生成完整的元数据跟踪。 典型的重复数据删除存储系统实现10倍或更高的重复数据删除率,元数据收集比处理所有原始文件更快,并生成较小存储的紧凑型元数据。

    Out-of-core similarity matching
    7.
    发明授权
    Out-of-core similarity matching 有权
    核心外相似度匹配

    公开(公告)号:US08914338B1

    公开(公告)日:2014-12-16

    申请号:US13335416

    申请日:2011-12-22

    IPC分类号: G06F17/30

    摘要: A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.

    摘要翻译: 一种通过将数据分割为多个数据块并通过对多个块的每个块应用预定算法来生成多个块中的每一个块的代表数据来将数据存储在数据存储系统中的方法。 随后,比较和分类代表数据。 通过评估排序的代表数据集来识别基本数据块的代表数据和可以相对于基本数据块存储的其他数据块的代表数据。 最后,将被识别为能够相对于基本数据块存储的其他数据块中的每一个存储在数据存储系统中作为数据块和基本数据块之间的差异。

    Method and apparatus for determining optimal chunk sizes of a deduplicated storage system
    8.
    发明授权
    Method and apparatus for determining optimal chunk sizes of a deduplicated storage system 有权
    用于确定重复数据消除的存储系统的最佳块大小的方法和装置

    公开(公告)号:US08639669B1

    公开(公告)日:2014-01-28

    申请号:US13334732

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F17/00

    摘要: Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.

    摘要翻译: 这里描述了用于评估存储系统中的数据块的重复数据删除效率的技术。 在一个实施例中,检查与重复数据消除的存储系统相关联的第一数据块的元数据,其中根据第一块大小对第一数据块进行了分区。 基于对第一数据块的元数据的检查来计算第二块大小。 根据第二块大小将第一数据块的元数据合并以表示第一数据块将被合并到的第二数据块。 基于合并的元数据来确定第二数据块的重复数据删除率。

    Method and apparatus for efficiently searching data in a storage system
    9.
    发明授权
    Method and apparatus for efficiently searching data in a storage system 有权
    用于有效地搜索存储系统中的数据的方法和装置

    公开(公告)号:US08756249B1

    公开(公告)日:2014-06-17

    申请号:US13216013

    申请日:2011-08-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30106

    摘要: Techniques for searching data in a storage system are described herein. In one embodiment, in response to a request for searching target data in a storage system, first representative data for the target data being searched are generated by applying a predetermined algorithm to at least a portion of the target data. The first representative data are searched and compared with second representative data representing one or more data sets stored in the storage system. It is indicated a likelihood that the target data or similar content has been found in the storage system based on the search and comparison.

    摘要翻译: 本文描述了用于在存储系统中搜索数据的技术。 在一个实施例中,响应于在存储系统中搜索目标数据的请求,通过将预定算法应用于目标数据的至少一部分来生成正在搜索的目标数据的第一代表数据。 搜索第一代表数据并将其与表示存储在存储系统中的一个或多个数据集的第二代表数据进行比较。 表示基于搜索和比较在存储系统中找到目标数据或类似内容的可能性。

    Method and apparatus for content-aware resizing of data chunks for replication
    10.
    发明授权
    Method and apparatus for content-aware resizing of data chunks for replication 有权
    用于内容感知调整数据块以进行复制的方法和装置

    公开(公告)号:US08712963B1

    公开(公告)日:2014-04-29

    申请号:US13334723

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F11/20 H04L29/08

    摘要: Techniques for replicating data chunks in a storage system are described herein. In one embodiment, in response to a request for replicating data chunks of a source storage system having a first average chunk size to a target storage system having a second average chunk size, a new chunk size is determined based on metadata of the data chunks in view of an average chunk size of the target storage system. The data chunks are resized based on the new chunk size to generate resized data chunks. The resized data chunks are transmitted from the source storage system to the target storage system for replication.

    摘要翻译: 这里描述了用于在存储系统中复制数据块的技术。 在一个实施例中,响应于将具有第一平均块大小的源存储系统的数据块复制到具有第二平均块大小的目标存储系统的请求,基于数据块的元数据来确定新的块大小 查看目标存储系统的平均块大小。 基于新的块大小对数据块进行大小调整,以生成调整大小的数据块。 调整大小的数据块从源存储系统传输到目标存储系统进行复制。