摘要:
Provided are a computer program product, system, and method for identifying modified chunks in a data set for storage. Modifications are received to at least one of the chunks in the data set. A determination is made of at least one range of least one of the chunks including data affected by the modifications determination is made as to whether at least one chunk outside of the at least one range has changed. For each determined at least one chunk outside of the at least one range that has changed, a determination is made of at least one new chunk and a new digest of the at least one new chunk and information is added on the at least one new chunk and information to locate the new chunk in the data set.
摘要:
Described are embodiments of an invention for identifying chunk boundaries for optimization of fingerprint-based deduplication in a computing environment. Storage objects that are backed up in a computing environment are often compound storage objects which include many individual storage objects. The computing device of the computing environment breaks the storage objects into chunks of data by determining a hash value on a range of data. The computing device creates an artificial chunk boundary when the end of data of the storage object is reached. When an artificial chunk boundary is created for the end of data of a storage object, the computing device stores a pseudo fingerprint for the artificial chunk boundary. If a hash value matches a fingerprint or a pseudo fingerprint, then the computing device determines that the range of data corresponds to a chunk and the computing system defines the chunk boundaries.
摘要:
Provided are a computer program product, system, and method for identifying modified chunks in a data set for storage. Information is maintained on a data set of variable length chunks, including a digest of each chunk and information to locate the chunk in the data set. Modifications are received to at least one of the chunks in the data set. A determination is made of at least one range of least one of the chunks including data affected by the modifications, wherein each range identifies one chunk or sequential chunks having data affected by the modifications. The at least one chunk in each range is processed to determine at least one new chunk in each range, and for each determined new chunk, a digest of the new chunk. A determination is made as to whether at least one chunk outside of the at least one range has changed. For each determined at least one chunk outside of the at least one range that has changed, a determination is made of at least one new chunk and a new digest of the at least one new chunk. Adding to the set information the new digest information on the at least one new chunk and information to locate the new chunk in the data set.
摘要:
Described are embodiments of an invention for identifying chunk boundaries for optimization of fingerprint-based deduplication in a computing environment. Storage objects that are backed up in a computing environment are often compound storage objects which include many individual storage objects. The computing device of the computing environment breaks the storage objects into chunks of data by determining a hash value on a range of data. The computing device creates an artificial chunk boundary when the end of data of the storage object is reached. When an artificial chunk boundary is created for the end of data of a storage object, the computing device stores a pseudo fingerprint for the artificial chunk boundary. If a hash value matches a fingerprint or a pseudo fingerprint, then the computing device determines that the range of data corresponds to a chunk and the computing system defines the chunk boundaries.
摘要:
Provided are a computer program product, system, and method for identifying modified chunks in a data set for storage. Information is maintained on a data set of variable length chunks, including a digest of each chunk and information to locate the chunk in the data set. Modifications are received to at least one of the chunks in the data set. A determination is made of at least one range of least one of the chunks including data affected by the modifications, wherein each range identifies one chunk or sequential chunks having data affected by the modifications. The at least one chunk in each range is processed to determine at least one new chunk in each range, and for each determined new chunk, a digest of the new chunk. A determination is made as to whether at least one chunk outside of the at least one range has changed. For each determined at least one chunk outside of the at least one range that has changed, a determination is made of at least one new chunk and a new digest of the at least one new chunk. Adding to the set information the new digest information on the at least one new chunk and information to locate the new chunk in the data set.
摘要:
Provided are a computer program product, system, and method for identifying modified chunks in a data set for storage. Modifications are received to at least one of the chunks in the data set. A determination is made of at least one range of least one of the chunks including data affected by the modifications determination is made as to whether at least one chunk outside of the at least one range has changed. For each determined at least one chunk outside of the at least one range that has changed, a determination is made of at least one new chunk and a new digest of the at least one new chunk and information is added on the at least one new chunk and information to locate the new chunk in the data set.
摘要:
Provided are computer program product, system, and method for restoring deduplicated data objects from sequential backup devices. A server stores data objects of extents having deduplicated data in the at least one sequential backup device. The server receives from a client a request for data objects. The server determines extents stored in the at least one sequential backup device for the requested data objects. The server or client sorts the extents according to an order in which they are stored in the at least one sequential backup device to generate a sort list. The server retrieves the extents from the at least one sequential backup device according to the order in the sort list to access the extents sequentially from the sequential backup device in the order in which they were stored. The server returns the retrieved extents to the client and the client reconstructs the requested data objects from the received extents.
摘要:
Capability test programs are generated implementing the capability test test cases for the components, wherein component developers use the capability test programs to test components during the development and coding of the components. Documentation is generated describing component abstract test cases incorporating information on capability test test cases. Component test programs are generated implementing the component abstract test cases for the components and utilizing capability test programs, wherein each component test program tests one component for at least one test case specified in the component abstract test case documentation for the component. The capability test programs and component test programs are stored in a shared repository. A software development program is deployed to enable the developers and testers to execute groups of component test programs in the shared repository to test the components during different phases of the development of the software product.
摘要:
Provided are a computer program product, system, and method for restoring a restore set of files from backup objects stored in sequential backup devices. Backup objects are stored in at least one sequential backup device. A client initiates a restore request to restore a restore set of data in a volume as of a restore point-in-time. A determination is made of backup objects stored in at least one sequential backup device including the restore set of data for the restore point-in-time, wherein the determined backup objects are determined from a set of backup objects including a full volume backup and delta backups providing data in the volume at different points-in-time, and wherein extents in different backup objects providing data for blocks in the volume at different points-in-time are not stored contiguously in the sequential backup device. A determination is made of extents stored in the at least one sequential backup device for the determined backup objects. The determined extents are sorted according to an order in which they are stored in the at least one sequential backup device to generate a sort list. The extents are retrieved from the at least one sequential backup device according to the order in the sort list to access the extents sequentially from the sequential backup device in the order in which they were stored. The retrieved extents are returned to the client and the client reconstructs the restore data set from the received extents.
摘要:
Provided are a computer program product, system, and method for encrypting data objects to back-up to a server. A client private key is intended to be maintained only by the client. A data object of chunks to store at the server is generated. A first portion of the chunks in the data object is encrypted with the client private key and the first portion of the chunks in the data object encrypted with the client private key are sent to the server to store. A second portion of the chunks in the data object not encrypted with the client private key are sent to the server to store.