System and method for partitioning backup data streams in a deduplication based storage system
    1.
    发明授权
    System and method for partitioning backup data streams in a deduplication based storage system 有权
    用于在基于重复数据消除的存储系统中分区备份数据流的系统和方法

    公开(公告)号:US08983952B1

    公开(公告)日:2015-03-17

    申请号:US12846132

    申请日:2010-07-29

    IPC分类号: G06F7/00

    摘要: A system and method for partitioning a data stream into a plurality of segments of varying sizes. A data stream manager partitions a received data stream into segments which are then conveyed to a deduplication engine for processing. The data stream received by the data stream manager includes metadata corresponding to the data stream. Based upon the metadata, which may include an indication as to a type of data included in the data stream, the data stream is partitioned into segments for further processing. A size of a segment used for partitioning given data is based at least in part on a type of data being partitioned. The variable segment sizes may be chosen to balance between maximizing the deduplication ratio and minimizing both the segment count and the size of the fingerprint index.

    摘要翻译: 一种用于将数据流划分成多个不同大小的段的系统和方法。 数据流管理器将接收到的数据流分割成段,然后将其传送到重复数据删除引擎进行处理。 由数据流管理器接收的数据流包括对应于数据流的元数据。 基于元数据,其可以包括关于包括在数据流中的数据类型的指示,数据流被划分成段以用于进一步处理。 用于分割给定数据的段的大小至少部分地基于被分割的数据的类型。 可选择可变段大小以在最大化重复数据删除率和最小化指纹索引的段计数和大小之间进行平衡。

    Managing backups of data objects in containers
    2.
    发明授权
    Managing backups of data objects in containers 有权
    管理容器中数据对象的备份

    公开(公告)号:US08874522B2

    公开(公告)日:2014-10-28

    申请号:US13285331

    申请日:2011-10-31

    IPC分类号: G06F17/30 G06F7/00 G06F11/14

    摘要: Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.

    摘要翻译: 存储在特定备份期间存储写入这些容器的数据对象的容器。 然后,识别容器的一个子集; 子集中的容器具有小于阈值数量的与特定备份相关联的数据对象。 位于该子集中并与备份关联的容器中的数据对象将复制到一个或多个其他容器。 这些其他容器随后用于还原与备份相关联的数据对象。

    Efficient data backup with change tracking
    3.
    发明授权
    Efficient data backup with change tracking 有权
    有效的数据备份与更改跟踪

    公开(公告)号:US08775377B1

    公开(公告)日:2014-07-08

    申请号:US13557558

    申请日:2012-07-25

    IPC分类号: G06F17/30 G06F17/00

    摘要: The present disclosure provides for efficiently creating a full backup image of a client device by efficiently communicating backup data to a backup server using a change tracking log, or track log. A present full backup image can be created using a track log that is associated with a previous full backup image. The client device can determine whether files, which were included in the previous full backup image, have or have not changed using the track log. The client device can transmit changed file data to the backup server for inclusion in the present full backup image. The client device can also transmit metadata identifying unchanged file data to the backup server. The backup server can use the metadata to extract a copy of the unchanged file data from the previous full backup image for inclusion in the present full backup image.

    摘要翻译: 本公开提供了通过使用变化跟踪日志或跟踪日志有效地将备份数据传送到备份服务器来有效地创建客户端设备的完整备份映像。 可以使用与之前的完整备份映像相关联的跟踪日志来创建当前完整备份映像。 客户端设备可以确定包含在以前的完整备份映像中的文件是否使用跟踪日志进行了更改或尚未更改。 客户端设备可以将已更改的文件数据发送到备份服务器,以便包含在当前的完整备份映像中。 客户端设备还可以将识别不变文件数据的元数据传送到备份服务器。 备份服务器可以使用元数据从以前的完整备份映像中提取未更改的文件数据的副本,以便包含在当前的完整备份映像中。

    Systems and Methods for Providing Increased Scalability in Deduplication Storage Systems
    4.
    发明申请
    Systems and Methods for Providing Increased Scalability in Deduplication Storage Systems 有权
    在重复数据删除存储系统中提高可扩展性的系统和方法

    公开(公告)号:US20120185447A1

    公开(公告)日:2012-07-19

    申请号:US13007301

    申请日:2011-01-14

    IPC分类号: G06F17/30

    摘要: A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.

    摘要翻译: 用于在重复数据删除存储系统中提供增加的可扩展性的计算机实现的方法可以包括(1)识别存储多个参考对象的数据库,(2)确定数据库的至少一个尺寸相关特性已经达到预定阈值, (3)将数据库分割成能够彼此独立地更新的多个子数据库,(4)识别执行更新存储在至少一个子数据库中的一个或多个参考对象的更新操作的请求,以及 然后(5)在小于所有子数据库的情况下执行更新操作,以避免处理与对所有子数据库执行更新操作相关联的成本。 还公开了各种其它系统,方法和计算机可读介质。

    Constrained backup image defragmentation optimization within deduplication system

    公开(公告)号:US09928210B1

    公开(公告)日:2018-03-27

    申请号:US13459987

    申请日:2012-04-30

    IPC分类号: G06F12/00 G06F15/80

    CPC分类号: G06F15/8084 G06F17/30

    摘要: The present disclosure provides for defragmenting deduplicated data, such as one or more backup image files, stored in a deduplicated data store. A defragmentation module can be implemented on a deduplication server to reduce fragmentation of backup images and improve processing time for restoring a backup image. A defragmentation module can be configured to defragment a backup image file by migrating portions of data of the backup image file that are stored in various containers at non-contiguous locations throughout deduplicated data store. A defragmentation module can contiguously write the portions to one or more containers, which are stored at one or more new locations in the deduplicated data store. A defragmentation module can be configured to evaluate whether portions of a backup image file meet criteria for defragmentation. A defragmentation module can also be configured to update location information about the portions that are migrated to the new container(s).

    Processes and methods for client-side fingerprint caching to improve deduplication system backup performance
    7.
    发明授权
    Processes and methods for client-side fingerprint caching to improve deduplication system backup performance 有权
    用于客户端指纹缓存的流程和方法,以提高重复数据删除系统的备份性能

    公开(公告)号:US08874520B2

    公开(公告)日:2014-10-28

    申请号:US13026188

    申请日:2011-02-11

    IPC分类号: G06F17/30 G06F7/00 G06F11/14

    CPC分类号: G06F11/1458 G06F11/1453

    摘要: A system and method for caching fingerprints in a client cache is provided. A data object that comprises a set of data segments and describes a backup process is identified. Thereafter, a request referencing the data object is made to a deduplication server to request that a task identifier be added to the data object. If the deduplication server is able to successfully add the task identifier to the data object, then an active identifier is added to each data segment from the set of data segments in a cache that is within a client system.

    摘要翻译: 提供了一种用于缓存客户端缓存中的指纹的系统和方法。 识别包括一组数据段并描述备份过程的数据对象。 此后,向重复数据删除服务器发出引用数据对象的请求,以请求将任务标识符添加到数据对象。 如果重复数据删除服务器能够成功地将任务标识符添加到数据对象,则将活动标识符从客户机系统中的缓存中的数据段集合中添加到每个数据段。

    Systems and methods for restoring deduplicated data
    8.
    发明授权
    Systems and methods for restoring deduplicated data 有权
    用于恢复重复数据消除数据的系统和方法

    公开(公告)号:US08204862B1

    公开(公告)日:2012-06-19

    申请号:US12572532

    申请日:2009-10-02

    IPC分类号: G06F7/00

    摘要: A method for restoring deduplicated data may include receiving a request to restore a set of deduplicated data segments to a client system, where each data segment in the set of deduplicated data segments is referred to by one or more deduplication references. The method may also include procuring reference data that indicates, for each data segment in the set of deduplicated data segments, the number of deduplication references that point to the data segment. The method may further include using the reference data to select one or more data segments from the set of deduplicated data segments for client-side caching, caching the one or more data segments in a cache on the client system, and restoring the one or more data segments from the cache on the client system. Various other methods, systems, and computer-readable media are also disclosed.

    摘要翻译: 用于恢复重复数据消除的数据的方法可以包括接收将客户端系统的一组重复数据删除的数据段恢复的请求,其中所述重复数据删除的数据段集合中的每个数据段由一个或多个重复数据删除引用引用。 该方法还可以包括获取参考数据,该参考数据针对重复数据删除的数据段中的每个数据段指示指向数据段的重复数据删除引用的数量。 该方法还可以包括使用参考数据从用于客户机侧缓存的重复数据删除的数据段集合中选择一个或多个数据段,将客户机系统上的高速缓存中的一个或多个数据段缓存,以及恢复一个或多个 来自客户端系统缓存的数据段。 还公开了各种其它方法,系统和计算机可读介质。

    Efficient data storage and retrieval for backup systems
    9.
    发明授权
    Efficient data storage and retrieval for backup systems 有权
    备份系统的高效数据存储和检索

    公开(公告)号:US09298707B1

    公开(公告)日:2016-03-29

    申请号:US13250156

    申请日:2011-09-30

    IPC分类号: G06F7/00 G06F17/30

    摘要: Systems and methods for providing efficient storage and retrieval of data are disclosed. A two-level segment labeling mechanism may be employed to ensure that unique data segments from particular backup data sets are stored together in a storage container. The two-level segment labeling may facilitate preservation of the relative positions of segments within the backup stream during compaction operations. Also, backup data restoration performance may be improved by use of multiple read threads that are localized to particular storage containers.

    摘要翻译: 公开了提供数据有效存储和检索的系统和方法。 可以采用两级段标签机制来确保来自特定备份数据集的唯一数据段一起存储在存储容器中。 在压缩操作期间,两级段标签可以有助于保留备份流中段的相对位置。 此外,可以通过使用本地化到特定存储容器的多个读取线程来改进备份数据恢复性能。

    Deduplication system space recycling through inode manipulation
    10.
    发明授权
    Deduplication system space recycling through inode manipulation 有权
    重复数据删除系统通过inode操作循环利用

    公开(公告)号:US08904137B1

    公开(公告)日:2014-12-02

    申请号:US13106097

    申请日:2011-05-12

    IPC分类号: G06F17/30

    摘要: A system and method for improving performance within a storage system employing deduplication techniques using address manipulation are disclosed. A data segment within a storage object is identified from among a number of data segments within a storage object. The data segment represents data stored in a storage device. Some or all of the data represented by the data segment is stored in a data block that is associated with the data segment. The storage object is then compacted. Compaction includes reordering data segments, including the identified data segment, by performing address manipulation on a data block address of the data block (e.g., an address of the data block within the storage device). The reordering of the data segments changes the order of the data segments within the storage object.

    摘要翻译: 公开了一种使用地址操纵使用重复数据删除技术改善存储系统内性能的系统和方法。 从存储对象内的多个数据段中识别存储对象内的数据段。 数据段表示存储在存储设备中的数据。 由数据段表示的数据中的一些或全部存储在与数据段相关联的数据块中。 然后将存储对象压缩。 压缩包括通过对数据块的数据块地址(例如,存储设备内的数据块的地址)执行地址操作来重新排序包括所识别的数据段的数据段。 数据段的重新排序更改了存储对象内的数据段的顺序。