Abstract:
A cost function is determined for assigning first deduplicating storage units of a first storage system for replication onto second deduplicating storage units of a second storage system. One or more of the first storage units in the first storage system are assigned to one or more of the second storage units in the second storage system based on a minimized cost resulting from the cost function.
Abstract:
A system for directing for storage includes a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions.
Abstract:
Techniques for providing access to data objects within another data object are described herein. In one embodiment, a compound object including multiple data objects is received and metadata is extracted for a data object from the compound object, where the metadata includes a layout of the data object in view of the compound object. Subsequently, access to one or more of the data objects within the compound object is provided based on the extracted metadata without using an application associated with the compound object. Other methods and apparatuses are also described.
Abstract:
A request for allocating a storage unit of a storage system is received to back up data of one or more clients. The storage system includes multiple storage units and each storage unit storing data that is deduplicated within each storage unit. In response to the request, one or more of the storage units are selected based on an amount of deduplicated data that would be stored in each of the storage units after storing the data of the one or more clients. The selected one or more storage units are allocated to the one or more clients to back up the data of the one or more clients.
Abstract:
A method and apparatus for different embodiments of incremental garbage collection of data in a secondary storage. In one embodiment, a method comprises locating blocks of data in a log that are referenced and within a range at a tail of the log. The method also includes copying the blocks of data that are referenced and within the range to an unallocated segment of the log.
Abstract:
A method of storing data is disclosed. A set of data blocks, including a plurality of proper subsets of data blocks, is stored. A plurality of first-level parity blocks is generated, wherein each first-level parity block is generated from a corresponding proper subset of data blocks within the plurality of proper subsets of data blocks without reference to other data blocks not in the corresponding proper subset. A second-level parity block is generated, wherein the second level parity block is generated from a plurality of data blocks included in at least two of the plurality of proper subsets of data blocks, and wherein recovery of a lost block in a given proper subset of data blocks is possible without reference to any data blocks not in the given proper subset.
Abstract:
A method of determining whether a data segment is a duplicate using cooperating deduplicators is disclosed. The data segment is received. A first deduplicator is operated to to determine whether the incoming data segment is a duplicate based on first information available to the first deduplicator regarding stored data segments that are stored in a memory. A second deduplicator is selectively operated to determine whether the incoming data segment is a duplicate based on second information available to the second deduplicator; wherein the selective operation of the second deduplicator depends on the determination made by the first deduplicator.
Abstract:
Storage of data segments is disclosed. For each segment, a similar segment to the segment is identified, wherein the similar segment is already managed by a cluster node. In the event the similar segment is identified, a reference to the similar segment and a delta between the similar segment and the segment are caused to be stored instead of the segment.
Abstract:
Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and in the event that a similar segment to the segment is identified that is already managed by the selected cluster node, a reference to the similar segment and a delta between the similar segment and the segment is caused to be stored on the selected cluster node.
Abstract:
A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit.