摘要:
Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.
摘要:
Containers that store data objects that were written to those containers during a particular backup are accessed. Then, a subset of the containers is identified; the containers in the subset have less than a threshold number of data objects associated with the particular backup. Data objects that are in containers in that subset and that are associated with the backup are copied to one or more other containers. Those other containers are subsequently used to restore data objects associated with the backup.
摘要:
A system and method for backing up files to a single-instance storage system are disclosed. The files may be split into segments, and the file data may be stored in the single-instance storage system as individual segments. The single-instance storage system uses the concept of a file region which covers multiple segments of the file. If a region of a file is unchanged from one backup to the next, the system may use a region object to refer to the unchanged region. This avoids the need to update the reference information for each of the segments within the region, thus increasing the efficiency of backing up the new version of the file.
摘要:
A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
摘要:
A computer-implemented method for providing increased scalability in deduplication storage systems may include (1) identifying a database that stores a plurality of reference objects, (2) determining that at least one size-related characteristic of the database has reached a predetermined threshold, (3) partitioning the database into a plurality of sub-databases capable of being updated independent of one another, (4) identifying a request to perform an update operation that updates one or more reference objects stored within at least one sub-database, and then (5) performing the update operation on less than all of the sub-databases to avoid processing costs associated with performing the update operation on all of the sub-databases. Various other systems, methods, and computer-readable media are also disclosed.
摘要:
A computer-implemented method for validating ownership of deduplicated data may include (1) identifying a request from a remote client to store a data object in a data store that already includes an instance of the data object, (2) in response to the request, verifying that the remote client possesses the data object by (i) issuing a randomized challenge to the remote client, the randomized challenge including a random value which, when combined with at least a portion of the data object, produces an authentication token demonstrating possession of the data object and, in response to the randomized challenge, (ii) receiving the authentication token from the remote client; and, in response to receiving the authentication token from the remote client, (3) storing the data object in the data store on behalf of the remote client. Various other methods and systems are also disclosed.
摘要:
A system and method for efficiently reducing latency of accessing an index for a data segment stored on a server. A server both removes duplicate data and prevents duplicate data from being stored in a shared data storage. The file server is coupled to an index storage subsystem holding fingerprint and pointer value pairs corresponding to a data segment stored in the shared data storage. The pairs are stored in a predetermined order. The file server utilizes an ordered binary search tree to identify a particular block of multiple blocks within the index storage subsystem corresponding to a received memory access request. The index storage subsystem determines whether an entry corresponding to the memory access request is located within the identified block. Based on at least this determination, the file server processes the memory access request accordingly. In one embodiment, the index storage subsystem is a solid-state disk (SSD).
摘要:
A system and method for efficiently reducing a number of duplicate blocks of stored data. A file server both removes duplicate data and prevents duplicate data from being stored in the shared storage. A sampling rate may be used to determine which fingerprints, or hash values, are stored in an index. The sampling rate may be modified in response to changes in characteristics of the system, such as a change in the shared storage size, a change in a utilization of the shared storage, a change in the size of the storage unit, and reaching a threshold corresponding to utilization of the index. Also, a small cache may be maintained for holding fingerprint and pointer pair values prefetched from the shared storage. Each prefetched pair may be associated with data corresponding to a previous hit in the index. The association may be related to spatial locality, temporal locality, or otherwise.
摘要:
A de-duplication storage system which uses multiple indices is described. A first group of one or more indices may be stored in random access memory (RAM) or another type of fast storage. A second group of one or more indices may be stored on one or more disk drives or another type of storage where large amounts of data can be stored inexpensively. The first group of indices may be used when adding new files to the de-duplication storage system in order to determine whether the file segments of the new files are already stored. The second group of indices may be used when restoring files in order to lookup the segments of the files.
摘要:
A computer-implemented method for removing unreferenced data segments from deduplicated data systems may include: 1) identifying a deduplicated data system that contains a plurality of data segments, 2) identifying a plurality of containers within the deduplicated data system, with each container containing a subset of the data segments within the deduplicated data system, 3) identifying at least one container within the plurality of containers that is likely to include a large proportion of data segments that are not referenced by data objects within the deduplicated data system, and then, for each identified container, 4) searching for unreferenced data segments within the identified container and 5) removing the unreferenced data segments from the identified container. Various other methods, systems, and computer-readable media are also disclosed.