摘要:
A method of de-duplicating duplicate data in a data storage system that includes identifying a plurality of portions of data, comparing each portion of the data to identify duplicate data and identifying a link associated with each duplicate data, determining whether a Hamming link-separation-distance between the identified link and all other existing links is greater than twice the Hamming radius of an error correction code in the data storage system, and then replacing the duplicate data with the identified link.
摘要:
A method to store data is disclosed. The method provides a plurality of data storage media, an automated data library comprising one or more data storage devices, a first plurality of storage cells, and a robotic accessor. The method further provides a storage vault comprising a second plurality of storage cells but no data storage devices. The method selects the (i)th data storage medium and sets the (i)th data state, where that (i)th data state is selected from the group consisting of online, offline, and vault. If the method sets the (i)th data state is set to online, then the method mounts that (i)th data storage medium in one of the data storage devices. If the method sets the (i)th data state to offline, then the method removeably places the (i)th data storage medium in one of the first plurality of storage cells. If the method sets the (i)th data state is set to vault, then the method places the (i)th data storage medium in one of the second plurality of storage cells.
摘要:
A system, method and computer program product for managing command ordering and command execution for a host-Disk-to-intermediate-Disk-to-Holographic (D2D2H) data storage system. Specifically, a command ordering and execution (COE) utility selects the command group from a command queue. A determination is made whether the command group includes a write command for writing an entire hologram segment. Responsive to a determination that the command group does not include the write command for writing the entire hologram segment, the entire hologram segment is read to an intermediate system disk. Conflicting commands are then sorted from non-conflicting commands. Specifically, all conflicting write commands are executed before all conflicting read commands. After execution, the entire hologram segment of the intermediate system disk is closed and written in holographic medium.
摘要:
A system, method and computer program product for managing command ordering for a host-Disk-to-intermediate-Disk-to-Holographic (D2D2H) data storage system. Specifically, a command ordering detects a command from a host system. A hologram segment associated with the detected command is identified and a determination is made whether the hologram segment is an open hologram segment or a closed hologram segment. A determination is made whether the detected command is to be prioritized. If the detected command is prioritized, the detected command is added to a prioritized command queue. Moreover, if the detected command is not prioritized, the detected command is added to a normal command queue. The detected commands addressing the same hologram segment are then grouped. The execution of one or more grouped commands (prioritized or normal) is deferred for a predetermined period to allow for additional commands to be received for a same command group.
摘要:
A method of adaptively selecting an optimum data deduplication chunking method receives a request to deduplicate a file, wherein the file has a file type. The method searches a table of file types, wherein the table includes, for each file type, a chunking method, a deduplication ratio, and a depulication ratio threshold. The method selects a chunking method for the file according to the table. The method chunks the file using the selected chunking method. The method deduplicates the chunked file according to prior art deduplication methods. The method calculates a deduplication ratio for the file type and updates the table with the calculated deduplication ratio for the file type. If the calculated deduplication ratio for the file type is less than the deduplication ratio threshold for the file type, the method selects a new chunking method for the file type and updates the table of file types with the new chunking method for the file type.
摘要:
In a method of and a system for deduplicating backed-up data backup clients create respective backup tables comprising a list of files and respective file types to be backed up. A backup server receives backup tables from the backup clients. The backup server merges the received backup tables to form a merged backup table. The backup server sorts the merged backup table according to file type from a file type yielding a best deduplication ratio to a file type yielding a worst deduplication ratio, thereby forming a sorted backup table. The backup server requests the files listed in the sorted backup table, in order, from the backup clients. The backup server deduplicates files received from the backup clients, in order, using deduplication parameters optimized according to file type. The method calculates an updated deduplication ratio for each deduplicated file type. Examples of deduplication parameters include chunking techniques and hashing techniques.
摘要:
Various embodiments for differentiating between data and stubs pointing to a parent copy of deduplicated data are provided. Undeduplicated data is stored with a first cyclic redundancy check (CRC) seed. A stub pointing to the parent copy of the deduplicated data is stored with a second CRC seed. Subsequent to reading the deduplicated data, the first CRC seed is associated with the undeduplicated data, and the second CRC seed is associated with the stub. A CRC check is performed using one of the first and second CRC seeds. If the CRC check is positive, an I/O operation is allowed to proceed. If the CRC check is negative, an additional CRC check is performed using another one of the first and second CRC seeds.
摘要:
An apparatus and method to store data are disclosed. The method provides a data storage system comprising a fossilized data management apparatus interconnected with one or more data storage devices. The method provides to the fossilized data management apparatus information and meta data associated with that information, wherein the meta data comprises a format field, a context field, a retention field, a data management field, and a storage management field. The fossilized data management apparatus instructs the one or more data storage devices to write the information to the one or more data storage devices based upon the meta data format field.
摘要:
A system, apparatus, method, and computer product that allow multiple host systems to read and write, in parallel, to a single media and/or tape drive unit, without conflict.
摘要:
A method for deduplicating and managing data blocks within a file system includes adding a deduplication identifier to each pointer pointing to a data block to indicate whether the data block is deduplicated, detecting duplicate data blocks, determining whether one of the duplicate data blocks has been deduplicated, when detected, determining that one duplicate data block is a master copy when it is determined that one duplicate data block has been deduplicated, selecting one of the duplicate data blocks to be a master copy when it is determined that the duplicate data blocks have not been deduplicated, and setting the deduplication identifier of the selected duplicate data block to indicate deduplication, and determining that the other duplicate data block is a new duplicate data block and setting the deduplication identifier of the other duplicate data block to indicate deduplication and directing the respective pointer to the master copy.