Method and system for dynamic compression module selection
    3.
    发明授权
    Method and system for dynamic compression module selection 有权
    动态压缩模块选择方法与系统

    公开(公告)号:US09571698B1

    公开(公告)日:2017-02-14

    申请号:US13436680

    申请日:2012-03-30

    摘要: A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.

    摘要翻译: 一种用于压缩数据集的计算机实现的方法,所述方法包括:接收所述数据集的第一数据块,由压缩管理模块自动选择来自多个压缩模块的压缩模块,以基于投影来应用于所述第一数据块 压缩功能或资源利用,以及使用所选择的压缩模块压缩第一数据块以生成第一压缩数据块。

    Method and apparatus for determining optimal chunk sizes of a deduplicated storage system
    4.
    发明授权
    Method and apparatus for determining optimal chunk sizes of a deduplicated storage system 有权
    用于确定重复数据消除的存储系统的最佳块大小的方法和装置

    公开(公告)号:US08639669B1

    公开(公告)日:2014-01-28

    申请号:US13334732

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F17/00

    摘要: Techniques for evaluating deduplication effectiveness of data chunks in a storage system are described herein. In one embodiment, metadata of first data chunks associated with a deduplicated storage system is examined, where the first data chunks have been partitioned according to a first chunk size. A second chunk size is calculated based on the examination of the metadata of first data chunks. Metadata of the first data chunks is merged according to the second chunk size to represent second data chunks to which the first data chunks would have been merged. A deduplication rate of the second data chunks is determined based on the merged metadata.

    摘要翻译: 这里描述了用于评估存储系统中的数据块的重复数据删除效率的技术。 在一个实施例中,检查与重复数据消除的存储系统相关联的第一数据块的元数据,其中根据第一块大小对第一数据块进行了分区。 基于对第一数据块的元数据的检查来计算第二块大小。 根据第二块大小将第一数据块的元数据合并以表示第一数据块将被合并到的第二数据块。 基于合并的元数据来确定第二数据块的重复数据删除率。

    Out-of-core similarity matching
    5.
    发明授权
    Out-of-core similarity matching 有权
    核心外相似度匹配

    公开(公告)号:US08914338B1

    公开(公告)日:2014-12-16

    申请号:US13335416

    申请日:2011-12-22

    IPC分类号: G06F17/30

    摘要: A method for storing data in a data storage system by partitioning the data into a plurality of data chunks and generating representative data for each of the plurality of chunks by applying a predetermined algorithm to each chunk of the plurality of chunks. Subsequently, the representative data is compared and sorted. Representative data for base data chunks and representative data for other data chunks that can be stored relative to the base data chunks are identified by evaluating the sorted set of representative data. Finally, each of the other data chunks identified as those that can be stored relative to a base data chunk are stored in the data storage system as the difference between the data chunk and a base data chunk.

    摘要翻译: 一种通过将数据分割为多个数据块并通过对多个块的每个块应用预定算法来生成多个块中的每一个块的代表数据来将数据存储在数据存储系统中的方法。 随后,比较和分类代表数据。 通过评估排序的代表数据集来识别基本数据块的代表数据和可以相对于基本数据块存储的其他数据块的代表数据。 最后,将被识别为能够相对于基本数据块存储的其他数据块中的每一个存储在数据存储系统中作为数据块和基本数据块之间的差异。

    Method and system for detecting unwanted content of files
    6.
    发明授权
    Method and system for detecting unwanted content of files 有权
    用于检测不需要的文件内容的方法和系统

    公开(公告)号:US08825626B1

    公开(公告)日:2014-09-02

    申请号:US13216020

    申请日:2011-08-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30994

    摘要: Techniques for detecting unwanted data are described herein. In one embodiment, a request is received for storing a data object in a storage system from a client over a network, where the request includes first representative data representing the data object without including actual content of the data object. It is detected whether the data object contains unwanted content by comparing the first representative data with second representative data without accessing the actual content of the data object, where the second representative data represents the unwanted content. A response is transmitted to the client over the network indicating whether the data object is likely to contain the unwanted object based on comparison of the first and second representative data.

    摘要翻译: 本文描述了用于检测不需要的数据的技术。 在一个实施例中,接收到用于通过网络从客户端将数据对象存储在存储系统中的请求,其中请求包括表示数据对象的第一代表数据,而不包括数据对象的实际内容。 通过将第一代表数据与第二代表数据进行比较来检测数据对象是否包含不需要的内容,而不访问数据对象的实际内容,其中第二代表数据表示不需要的内容。 基于第一和第二代表数据的比较,通过网络向客户端发送响应,指示数据对象是否可能包含不需要的对象。

    Method and apparatus for efficiently searching data in a storage system
    8.
    发明授权
    Method and apparatus for efficiently searching data in a storage system 有权
    用于有效地搜索存储系统中的数据的方法和装置

    公开(公告)号:US08756249B1

    公开(公告)日:2014-06-17

    申请号:US13216013

    申请日:2011-08-23

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30106

    摘要: Techniques for searching data in a storage system are described herein. In one embodiment, in response to a request for searching target data in a storage system, first representative data for the target data being searched are generated by applying a predetermined algorithm to at least a portion of the target data. The first representative data are searched and compared with second representative data representing one or more data sets stored in the storage system. It is indicated a likelihood that the target data or similar content has been found in the storage system based on the search and comparison.

    摘要翻译: 本文描述了用于在存储系统中搜索数据的技术。 在一个实施例中,响应于在存储系统中搜索目标数据的请求,通过将预定算法应用于目标数据的至少一部分来生成正在搜索的目标数据的第一代表数据。 搜索第一代表数据并将其与表示存储在存储系统中的一个或多个数据集的第二代表数据进行比较。 表示基于搜索和比较在存储系统中找到目标数据或类似内容的可能性。

    Method and apparatus for content-aware resizing of data chunks for replication
    9.
    发明授权
    Method and apparatus for content-aware resizing of data chunks for replication 有权
    用于内容感知调整数据块以进行复制的方法和装置

    公开(公告)号:US08712963B1

    公开(公告)日:2014-04-29

    申请号:US13334723

    申请日:2011-12-22

    IPC分类号: G06F7/00 G06F11/20 H04L29/08

    摘要: Techniques for replicating data chunks in a storage system are described herein. In one embodiment, in response to a request for replicating data chunks of a source storage system having a first average chunk size to a target storage system having a second average chunk size, a new chunk size is determined based on metadata of the data chunks in view of an average chunk size of the target storage system. The data chunks are resized based on the new chunk size to generate resized data chunks. The resized data chunks are transmitted from the source storage system to the target storage system for replication.

    摘要翻译: 这里描述了用于在存储系统中复制数据块的技术。 在一个实施例中,响应于将具有第一平均块大小的源存储系统的数据块复制到具有第二平均块大小的目标存储系统的请求,基于数据块的元数据来确定新的块大小 查看目标存储系统的平均块大小。 基于新的块大小对数据块进行大小调整,以生成调整大小的数据块。 调整大小的数据块从源存储系统传输到目标存储系统进行复制。