摘要:
Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.
摘要:
Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.
摘要:
A system and method for backing up files to a single-instance storage system are disclosed. The files may be split into segments, and the file data may be stored in the single-instance storage system as individual segments. The single-instance storage system uses the concept of a file region which covers multiple segments of the file. If a region of a file is unchanged from one backup to the next, the system may use a region object to refer to the unchanged region. This avoids the need to update the reference information for each of the segments within the region, thus increasing the efficiency of backing up the new version of the file.
摘要:
A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
摘要:
A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
摘要:
A de-duplication storage system which uses multiple indices is described. A first group of one or more indices may be stored in random access memory (RAM) or another type of fast storage. A second group of one or more indices may be stored on one or more disk drives or another type of storage where large amounts of data can be stored inexpensively. The first group of indices may be used when adding new files to the de-duplication storage system in order to determine whether the file segments of the new files are already stored. The second group of indices may be used when restoring files in order to lookup the segments of the files.
摘要:
A computer-implemented method for removing unreferenced data segments from deduplicated data systems may include: 1) identifying a deduplicated data system that contains a plurality of data segments, 2) identifying a plurality of containers within the deduplicated data system, with each container containing a subset of the data segments within the deduplicated data system, 3) identifying at least one container within the plurality of containers that is likely to include a large proportion of data segments that are not referenced by data objects within the deduplicated data system, and then, for each identified container, 4) searching for unreferenced data segments within the identified container and 5) removing the unreferenced data segments from the identified container. Various other methods, systems, and computer-readable media are also disclosed.
摘要:
A computer-implemented method for removing unreferenced data segments from deduplicated data systems may include: 1) identifying a deduplicated data system that contains a plurality of data objects, 2) dividing the data objects within the deduplicated data system into a plurality of data object groups, 3) identifying, within the data object groups, at least one data object group that has changed subsequent to a prior garbage-collection operation that removed data segments that were not referenced by data objects within the deduplicated data system, 4) identifying at least one container within the deduplicated data system that contains data segments referenced by data objects within the changed data object group, and then, for each identified container, 5) removing data segments from the identified container that are not referenced by data objects within the deduplicated data system. Various other methods, systems, and computer-readable media are also disclosed.
摘要:
A computer-implemented method for removing unreferenced data segments from deduplicated data systems may include: 1) identifying a deduplicated data system that contains a plurality of data objects, 2) dividing the data objects within the deduplicated data system into a plurality of data object groups, 3) identifying, within the data object groups, at least one data object group that has changed subsequent to a prior garbage-collection operation that removed data segments that were not referenced by data objects within the deduplicated data system, 4) identifying at least one container within the deduplicated data system that contains data segments referenced by data objects within the changed data object group, and then, for each identified container, 5) removing data segments from the identified container that are not referenced by data objects within the deduplicated data system. Various other methods, systems, and computer-readable media are also disclosed.
摘要:
The present disclosure provides for defragmenting deduplicated data, such as one or more backup image files, stored in a deduplicated data store. A defragmentation module can be implemented on a deduplication server to reduce fragmentation of backup images and improve processing time for restoring a backup image. A defragmentation module can be configured to defragment a backup image file by migrating portions of data of the backup image file that are stored in various containers at non-contiguous locations throughout deduplicated data store. A defragmentation module can contiguously write the portions to one or more containers, which are stored at one or more new locations in the deduplicated data store. A defragmentation module can be configured to evaluate whether portions of a backup image file meet criteria for defragmentation. A defragmentation module can also be configured to update location information about the portions that are migrated to the new container(s).