Content Aware Chunking for Achieving an Improved Chunk Size Distribution
    22.
    发明申请
    Content Aware Chunking for Achieving an Improved Chunk Size Distribution 有权
    用于实现改进的块大小分布的内容意识分块

    公开(公告)号:US20130054544A1

    公开(公告)日:2013-02-28

    申请号:US13222198

    申请日:2011-08-31

    IPC分类号: G06F7/00

    摘要: The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).

    摘要翻译: 本发明涉及使用滑动窗口将文件分成满足块大小限制的块,例如最大和最小块大小。 对于块大小限制内的文件位置,将窗口指纹的签名代表与目标模式进行比较,如果匹配则识别出块边界候选。 然后检查其他签名和模式以确定与该块块边界候选者相关联的最高排名签名(对应于最小编号的规则),或者如果最高排名签名匹配则设置实际边界。 如果没有匹配最高排名的签名达到最大块大小,则分块机制基于具有下一个最高排名的签名的候选者(如果没有候选,边界被设置为最大)而退化以设置边界。 还描述了基于模式检测(例如,零的运行)设置块边界。

    Generating storage reports using volume snapshots
    24.
    发明授权
    Generating storage reports using volume snapshots 失效
    使用卷快照生成存储报告

    公开(公告)号:US07548939B2

    公开(公告)日:2009-06-16

    申请号:US11107119

    申请日:2005-04-15

    IPC分类号: G06F17/30

    摘要: Described is a method and system by which storage reports are generated from a volume snapshot set, rather than from a live volume. A volume snapshot set includes a representation or copy of a volume at a single point in time. By scanning the snapshot, a consistent file system image is obtained. Scanning may take place by enumerating a volume's directories of files, or, when available, by accessing a file system metadata of file information (e.g., a master file table) separately maintained on the volume. With some (e.g., hardware-based) snapshot technologies, the snapshot can be transported to another computing system for scanning by that other computing system, thereby avoiding burdening a live system's resources when scanning. Accurate and consistent storage reports are thus obtained at a single point in time, independent of the number of volumes being scanned.

    摘要翻译: 描述了一种方法和系统,通过该方法和系统从卷快照集生成存储报告,而不是从实时卷生成。 卷快照集包括在单个时间点的卷的表示或副本。 通过扫描快照,获得一致的文件系统映像。 可以通过枚举卷的文件目录,或者在可用时通过访问单独维护在卷上的文件信息(例如,主文件表)的文件系统元数据来进行扫描。 利用一些(例如基于硬件的)快照技术,快照可以被传送到另一个计算系统,以便由其他计算系统进行扫描,从而避免在扫描时负担现场系统的资源。 因此,在单个时间点上获得了准确和一致的存储报告,与被扫描的卷数无关。

    Storage reports duplicate file detection
    25.
    发明授权
    Storage reports duplicate file detection 有权
    存储报告重复文件检测

    公开(公告)号:US07401080B2

    公开(公告)日:2008-07-15

    申请号:US11206710

    申请日:2005-08-17

    IPC分类号: G06F12/06

    摘要: Described is a storage reports duplicate file detector that operates by receiving file records during a first scan of file system metadata. The detector computes a hash based on attributes in the record, and maintains the hash value in association with information that indicates whether a hash value corresponds to more than one file. In one implementation, the information corresponds to the amount of space wasted by duplication. The information is used to determine which hash values correspond to groups of potentially duplicate files, and eliminate non-duplicates. A second scan locates file information for each of the potentially duplicate files, and the file information is then used to determine which groups of potentially duplicate files are actually duplicate files.

    摘要翻译: 描述的是存储报告重复文件检测器,其通过在文件系统元数据的第一次扫描期间接收文件记录来操作。 检测器基于记录中的属性来计算散列,并且将哈希值与指示散列值是否对应于多于一个文件的信息相关联地维护。 在一个实现中,信息对应于通过重复浪费的空间量。 该信息用于确定哪些哈希值对应于潜在重复文件的组,并消除不重复的数据。 第二次扫描查找每个潜在重复文件的文件信息,然后使用文件信息来确定哪些可能重复的文件组实际上是重复的文件。

    GARBAGE COLLECTION AND HOTSPOTS RELIEF FOR A DATA DEDUPLICATION CHUNK STORE
    26.
    发明申请
    GARBAGE COLLECTION AND HOTSPOTS RELIEF FOR A DATA DEDUPLICATION CHUNK STORE 审中-公开
    GARBAGE收藏和休息用于数据重复存储商店

    公开(公告)号:US20120159098A1

    公开(公告)日:2012-06-21

    申请号:US12971694

    申请日:2010-12-17

    IPC分类号: G06F12/02 G06F12/16

    CPC分类号: G06F12/0261

    摘要: Techniques for garbage collecting unused data chunks in storage are provided. According to one implementation, data chunks stored in a chunk container that are unused are identified based an analysis of one or more stream map chunks indicated as deleted. The identified data chunks are indicated as deleted. The storage space in the chunk container filled by the data chunks indicated as deleted may then be reclaimed. Techniques for selectively backing up data chunks are also provided. According to one implementation, a data chunk is received for storing in a chunk container. A backup copy of the received data chunk is stored in a backup container if the received data chunk is in a predetermined top percentage of most referenced data chunks in the chunk container and has a number of references greater than a predetermined reference threshold.

    摘要翻译: 提供垃圾收集存储器中未使用的数据块的技术。 根据一个实施方式,基于被指示为已删除的一个或多个流映射块的分析来识别存储在未使用的块容器中的数据块。 标识的数据块被表示为已删除。 然后可以回收由指定为已删除的数据块填充的块容器中的存储空间。 还提供了用于选择性地备份数据块的技术。 根据一个实施方式,接收用于存储在块容器中的数据块。 如果接收到的数据块处于块容器中大多数引用的数据块的预定最大百分比并且具有大于预定参考阈值的引用数量,则将所接收的数据块的备份副本存储在备份容器中。

    Determination of landmarks
    28.
    发明授权
    Determination of landmarks 有权
    确定地标

    公开(公告)号:US09189488B2

    公开(公告)日:2015-11-17

    申请号:US13081497

    申请日:2011-04-07

    IPC分类号: G06F17/30 G06F21/10

    CPC分类号: G06F17/30156 G06F21/10

    摘要: Hash values corresponding to a file are processed in windows to determine a minimum hash value for each window. Each window may begin at a minimum hash value determined for a previous window and end after a fixed number of hash values. If a hash value is less than a threshold hash value, it is added to a buffer that is used to store the hash values in sorted order for a current window. If a hash value is greater than the threshold, it is added to another buffer whose hash values are not stored in sorted order. At the end of the current window, the minimum hash value in the first buffer is selected as the landmark for the window. If the first buffer is empty, then the hash values in the other buffer are sorted and the minimum hash value is selected as the landmark for the window.

    摘要翻译: 在窗口中处理与文件相对应的哈希值,以确定每个窗口的最小哈希值。 每个窗口可以以对于前一窗口确定的最小散列值开始,并在固定数量的散列值之后结束。 如果哈希值小于阈值哈希值,则将其添加到缓冲区中,该缓冲区用于按当前窗口的排序顺序存储哈希值。 如果哈希值大于阈值,则将其添加到另一个缓冲区,其哈希值不按排序顺序存储。 在当前窗口的末尾,第一个缓冲区中的最小哈希值被选为窗口的里程碑。 如果第一个缓冲区为空,则另一个缓冲区中的哈希值被排序,并选择最小哈希值作为窗口的标志。

    DETERMINATION OF LANDMARKS
    29.
    发明申请
    DETERMINATION OF LANDMARKS 有权
    确定地名

    公开(公告)号:US20120259897A1

    公开(公告)日:2012-10-11

    申请号:US13081497

    申请日:2011-04-07

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30156 G06F21/10

    摘要: Hash values corresponding to a file are processed in windows to determine a minimum hash value for each window. Each window may begin at a minimum hash value determined for a previous window and end after a fixed number of hash values. If a hash value is less than a threshold hash value, it is added to a buffer that is used to store the hash values in sorted order for a current window. If a hash value is greater than the threshold, it is added to another buffer whose hash values are not stored in sorted order. At the end of the current window, the minimum hash value in the first buffer is selected as the landmark for the window. If the first buffer is empty, then the hash values in the other buffer are sorted and the minimum hash value is selected as the landmark for the window.

    摘要翻译: 在窗口中处理与文件相对应的哈希值,以确定每个窗口的最小哈希值。 每个窗口可以以对于前一窗口确定的最小散列值开始,并在固定数量的散列值之后结束。 如果哈希值小于阈值哈希值,则将其添加到缓冲区中,该缓冲区用于按当前窗口的排序顺序存储哈希值。 如果哈希值大于阈值,则将其添加到另一个缓冲区,其哈希值不按排序顺序存储。 在当前窗口的末尾,第一个缓冲区中的最小哈希值被选为窗口的里程碑。 如果第一个缓冲区为空,则另一个缓冲区中的哈希值被排序,并选择最小哈希值作为窗口的里程碑。

    Sharing volume data via shadow copies using differential areas
    30.
    发明授权
    Sharing volume data via shadow copies using differential areas 有权
    通过使用差分区域的卷影副本共享卷数据

    公开(公告)号:US07877553B2

    公开(公告)日:2011-01-25

    申请号:US11834028

    申请日:2007-08-06

    IPC分类号: G06F13/00

    摘要: Aspects of the subject matter described herein relate to sharing volume data via shadow copies. In aspects, an active computer creates a shadow copy of a volume. The shadow copy is exposed to one or more passive computers that may read but not write to the volume. A passive computer may obtain data from the shadow copy by determining whether the data has been written to a differential area and, if so, reading it from the differential area. If the data has not been written to the differential area, the passive computer may obtain it by first reading it from the volume, then re-determining whether it has been written to the differential area, and if so, reading the data from the differential area. Otherwise, the data read from the volume corresponds to the data needed for the shadow copy.

    摘要翻译: 本文描述的主题的方面涉及通过卷影副本共享卷数据。 在方面,活动计算机创建卷的卷影副本。 影子副本暴露给可能读取但不写入卷的一个或多个被动计算机。 无源计算机可以通过确定数据是否被写入差分区域而从影子拷贝获得数据,如果是,则从差分区域读取数据。 如果数据尚未写入差分区域,则无源计算机可以通过首先从卷中读取数据,然后重新确定是否将其写入差分区域,如果是,则从差分读取数据 区。 否则,从卷读取的数据对应于卷影副本所需的数据。