Constrained backup image defragmentation optimization within deduplication system

    公开(公告)号:US09928210B1

    公开(公告)日:2018-03-27

    申请号:US13459987

    申请日:2012-04-30

    IPC分类号: G06F12/00 G06F15/80

    CPC分类号: G06F15/8084 G06F17/30

    摘要: The present disclosure provides for defragmenting deduplicated data, such as one or more backup image files, stored in a deduplicated data store. A defragmentation module can be implemented on a deduplication server to reduce fragmentation of backup images and improve processing time for restoring a backup image. A defragmentation module can be configured to defragment a backup image file by migrating portions of data of the backup image file that are stored in various containers at non-contiguous locations throughout deduplicated data store. A defragmentation module can contiguously write the portions to one or more containers, which are stored at one or more new locations in the deduplicated data store. A defragmentation module can be configured to evaluate whether portions of a backup image file meet criteria for defragmentation. A defragmentation module can also be configured to update location information about the portions that are migrated to the new container(s).

    Processes and methods for client-side fingerprint caching to improve deduplication system backup performance
    3.
    发明授权
    Processes and methods for client-side fingerprint caching to improve deduplication system backup performance 有权
    用于客户端指纹缓存的流程和方法,以提高重复数据删除系统的备份性能

    公开(公告)号:US08874520B2

    公开(公告)日:2014-10-28

    申请号:US13026188

    申请日:2011-02-11

    IPC分类号: G06F17/30 G06F7/00 G06F11/14

    CPC分类号: G06F11/1458 G06F11/1453

    摘要: A system and method for caching fingerprints in a client cache is provided. A data object that comprises a set of data segments and describes a backup process is identified. Thereafter, a request referencing the data object is made to a deduplication server to request that a task identifier be added to the data object. If the deduplication server is able to successfully add the task identifier to the data object, then an active identifier is added to each data segment from the set of data segments in a cache that is within a client system.

    摘要翻译: 提供了一种用于缓存客户端缓存中的指纹的系统和方法。 识别包括一组数据段并描述备份过程的数据对象。 此后,向重复数据删除服务器发出引用数据对象的请求,以请求将任务标识符添加到数据对象。 如果重复数据删除服务器能够成功地将任务标识符添加到数据对象,则将活动标识符从客户机系统中的缓存中的数据段集合中添加到每个数据段。

    Method and system for improving performance with single-instance-storage volumes by leveraging data locality
    4.
    发明授权
    Method and system for improving performance with single-instance-storage volumes by leveraging data locality 有权
    通过利用数据局部性,通过单实例存储卷提高性能的方法和系统

    公开(公告)号:US08886605B1

    公开(公告)日:2014-11-11

    申请号:US13526168

    申请日:2012-06-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30156 G06F17/30097

    摘要: A method and system for improving performance with single-instance-storage volumes by leveraging data locality is provided. A client provides a set of fingerprints generated from data segments to be saved to a single-instance storage volume and receives the information on whether a data segment exists on the single-instance storage volume and where it is stored if a data segment exists. Based on its received information, the client determines if a number of non-sequential accesses of a computer-readable medium for the said set of segments from the single-instance-storage volume exceeds a predetermined threshold. If so, the client provides the whole set of data segments for storage within the single-instance storage volume regardless of whether or not the data segments are duplicate data segments. These sent data segments will be stored contiguously within the single-instance storage volume while the duplicates will be removed from their previous stored locations.

    摘要翻译: 提供了一种通过利用数据局部性提高单实例存储卷性能的方法和系统。 客户端提供从要保存到单实例存储卷的数据段生成的一组指纹,并且在数据段存在的情况下接收关于单实例存储卷上是否存在数据段以及存储数据段的信息。 基于其接收到的信息,客户机确定来自单实例存储卷的所述一组段的计算机可读介质的多个非顺序访问是否超过预定阈值。 如果是这样,客户端将提供整套数据段以供存储在单实例存储卷内,而不管数据段是否是重复的数据段。 这些发送的数据段将在单实例存储卷内连续存储,而重复的数据段将从之前存储的位置中删除。

    MANAGING BACKUPS OF DATA OBJECTS IN CONTAINERS
    5.
    发明申请
    MANAGING BACKUPS OF DATA OBJECTS IN CONTAINERS 有权
    管理集装箱数据对象的备份

    公开(公告)号:US20130110784A1

    公开(公告)日:2013-05-02

    申请号:US13285331

    申请日:2011-10-31

    IPC分类号: G06F7/00

    摘要: Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.

    摘要翻译: 存储在特定备份期间存储写入这些容器的数据对象的容器。 然后,识别容器的一个子集; 子集中的容器具有小于阈值数量的与特定备份相关联的数据对象。 位于该子集中并与备份关联的容器中的数据对象将复制到一个或多个其他容器。 这些其他容器随后用于还原与备份相关联的数据对象。

    Method and system for improving performance with single-instance-storage volumes by leveraging data locality
    6.
    发明授权
    Method and system for improving performance with single-instance-storage volumes by leveraging data locality 有权
    通过利用数据局部性,通过单实例存储卷提高性能的方法和系统

    公开(公告)号:US08204868B1

    公开(公告)日:2012-06-19

    申请号:US12165496

    申请日:2008-06-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30156 G06F17/30097

    摘要: A method and system for improving performance with single-instance-storage volumes by leveraging data locality is provided. A client provides a set of fingerprints generated from data segments to be saved to a single-instance storage volume and receives the information on whether a data segment exists on the single-instance storage volume and where it is stored if a data segment exists. Based on its received information, the client determines if a number of non-sequential accesses of a computer-readable medium for the set of segments from the single-instance-storage volume exceeds a predetermined threshold. If so, the client provides the whole set of data segments for storage within the single-instance storage volume regardless of whether or not the data segments are duplicate data segments. These sent data segments will be stored contiguously within the single-instance storage volume while the duplicates will be removed from their previous stored locations.

    摘要翻译: 提供了一种通过利用数据局部性提高单实例存储卷性能的方法和系统。 客户端提供从要保存到单实例存储卷的数据段生成的一组指纹,并且在数据段存在的情况下接收关于单实例存储卷上是否存在数据段以及存储数据段的信息。 基于其接收到的信息,客户端确定来自单实例存储卷的用于该组段的计算机可读介质的多个非顺序访问是否超过预定阈值。 如果是这样,客户端将提供整套数据段以供存储在单实例存储卷内,而不管数据段是否是重复的数据段。 这些发送的数据段将在单实例存储卷内连续存储,而重复的数据段将从之前存储的位置中删除。

    Method and system for efficient space management for single-instance-storage volumes
    7.
    发明授权
    Method and system for efficient space management for single-instance-storage volumes 有权
    用于单实例存储卷的高效空间管理的方法和系统

    公开(公告)号:US08041907B1

    公开(公告)日:2011-10-18

    申请号:US12165469

    申请日:2008-06-30

    IPC分类号: G06F12/00

    CPC分类号: G06F11/1453 G06F11/1448

    摘要: A method and system for efficient space management for single-instance-storage volumes is provided. A backup module storing data within a collection of containers according to access locality and retention time of the data, wherein the retention time defines an amount of time the data is stored within the collection of containers before deletion of the data, and the access locality comprises an order in which the data is to be accessed is further provided. A compaction module compacting the stored data by selecting at least two containers among the collection of containers, wherein the selection is performed using a predetermined criteria that includes access locality and retention time of the data is also provided. The compaction module distributes the data among the at least two containers. The compaction criteria creates an imbalance among the containers to create more empty, full, or nearly full containers.

    摘要翻译: 提供了一种用于单实例存储卷的高效空间管理的方法和系统。 一种备份模块,其根据所述数据的访问位置和保留时间,在容器集合内存储数据,其中所述保留时间定义了数据在所述数据集合中存储的时间量,并且所述访问位置包括 还提供了要访问数据的顺序。 压缩模块通过在容器集合中选择至少两个容器来压缩存储的数据,其中使用包括访问位置和数据的保留时间的预定标准来执行选择。 压缩模块在至少两个容器之间分配数据。 压实标准在容器之间产生不平衡,以创建更多的空,满或几乎完整的容器。

    Managing backups of data objects in containers
    8.
    发明授权
    Managing backups of data objects in containers 有权
    管理容器中数据对象的备份

    公开(公告)号:US08874522B2

    公开(公告)日:2014-10-28

    申请号:US13285331

    申请日:2011-10-31

    IPC分类号: G06F17/30 G06F7/00 G06F11/14

    摘要: Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.

    摘要翻译: 存储在特定备份期间存储写入这些容器的数据对象的容器。 然后,识别容器的一个子集; 子集中的容器具有小于阈值数量的与特定备份相关联的数据对象。 位于该子集中并与备份关联的容器中的数据对象将复制到一个或多个其他容器。 这些其他容器随后用于还原与备份相关联的数据对象。

    Efficient data backup with change tracking
    9.
    发明授权
    Efficient data backup with change tracking 有权
    有效的数据备份与更改跟踪

    公开(公告)号:US08775377B1

    公开(公告)日:2014-07-08

    申请号:US13557558

    申请日:2012-07-25

    IPC分类号: G06F17/30 G06F17/00

    摘要: The present disclosure provides for efficiently creating a full backup image of a client device by efficiently communicating backup data to a backup server using a change tracking log, or track log. A present full backup image can be created using a track log that is associated with a previous full backup image. The client device can determine whether files, which were included in the previous full backup image, have or have not changed using the track log. The client device can transmit changed file data to the backup server for inclusion in the present full backup image. The client device can also transmit metadata identifying unchanged file data to the backup server. The backup server can use the metadata to extract a copy of the unchanged file data from the previous full backup image for inclusion in the present full backup image.

    摘要翻译: 本公开提供了通过使用变化跟踪日志或跟踪日志有效地将备份数据传送到备份服务器来有效地创建客户端设备的完整备份映像。 可以使用与之前的完整备份映像相关联的跟踪日志来创建当前完整备份映像。 客户端设备可以确定包含在以前的完整备份映像中的文件是否使用跟踪日志进行了更改或尚未更改。 客户端设备可以将已更改的文件数据发送到备份服务器,以便包含在当前的完整备份映像中。 客户端设备还可以将识别不变文件数据的元数据传送到备份服务器。 备份服务器可以使用元数据从以前的完整备份映像中提取未更改的文件数据的副本,以便包含在当前的完整备份映像中。

    Systems and Methods for Providing Increased Scalability in Deduplication Storage Systems
    10.
    发明申请
    Systems and Methods for Providing Increased Scalability in Deduplication Storage Systems 有权
    在重复数据删除存储系统中提高可扩展性的系统和方法

    公开(公告)号:US20120185447A1

    公开(公告)日:2012-07-19

    申请号:US13007301

    申请日:2011-01-14

    IPC分类号: G06F17/30

    摘要: A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.

    摘要翻译: 用于在重复数据删除存储系统中提供增加的可扩展性的计算机实现的方法可以包括(1)识别存储多个参考对象的数据库,(2)确定数据库的至少一个尺寸相关特性已经达到预定阈值, (3)将数据库分割成能够彼此独立地更新的多个子数据库,(4)识别执行更新存储在至少一个子数据库中的一个或多个参考对象的更新操作的请求,以及 然后(5)在小于所有子数据库的情况下执行更新操作,以避免处理与对所有子数据库执行更新操作相关联的成本。 还公开了各种其它系统,方法和计算机可读介质。