Abstract:
Techniques are provided for deduplicating encrypted data. For example, a device has data to store in an encrypted state within a remote data store. A key is used to encrypt the data to create encrypted data. The data is hashed to create hashed data and the encrypted data is hashed to create hashed encrypted data. A probabilistic data structure of the data is generated. The key is encrypted based upon the data to create an encrypted key. The encrypted data is transmitted to the remote data store, along with metadata comprising the hashed data, the hashed encrypted data, the probabilistic data structure, and the encrypted key. The metadata may be used to implement deduplication for subsequent requests, to store data within the remote data store, with respect to the encrypted data.
Abstract:
A distributed storage system can use a high rate MSR erasure code to repair multiple nodes when multiple node failures occur. An encoder constructs m r-ary trees to determine the symbol arrays for the parity nodes. These symbol arrays are used to generate the parity data according to parity definitions or parity equations. The m r-ary trees are also used to identify a set of recovery rows across helper nodes for repairing a systematic node. When failed systematic nodes correspond to different ones of the m r-ary trees, a decoder may select additional recovery rows. The decoder selects additional recovery rows when the parity definitions do not provide a sufficient number of independent linear equations to solve the unknown symbols of the failed nodes. The decoder can select recovery rows contiguous to the already identified recovery rows for access efficiency.
Abstract:
Embodiments described herein provide an object store that efficiently manages and services objects for use by clients of a distributed data processing system. Illustratively, the object store may be embodied as a quasi-shared storage system that interacts with nodes of the distributed data processing system to service the objects as blocks of data stored on a plurality of storage devices, such as disks, of the storage system. To that end, an architecture of the object store may include an on-disk layout, e.g., of the storage system, and an incore layout, e.g., of the nodes, that cooperate to illustratively convert the blocks to objects for access by the clients.
Abstract:
Storage providers can securely store data and avoid data duplication with secure derivative data and offload the responsibility of generating the secure derivative data to the data owners. Initially, a data source will provide an encrypted version of data and the secure derivative data to a remote storage provider. The secure derivative data can include a hash of the data, a hash of the encrypted version of the data, a hash tree generated from the data, and an encrypted version of the key used to encrypt the data. When the remote storage provider later receives a request to store the same data, the remote storage provider uses the secure derivative data for secure proofs of storage and for proof of data possession.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within a first file system, store the metadata information in association with a data identifier within a second file system, retrieve the stored metadata information using the data identifier from within the second file system and locate and retrieve the data associated with the metadata information from within first file system.