Abstract:
Techniques are provided for deduplicating encrypted data. For example, a device has data to store in an encrypted state within a remote data store. A key is used to encrypt the data to create encrypted data. The data is hashed to create hashed data and the encrypted data is hashed to create hashed encrypted data. A probabilistic data structure of the data is generated. The key is encrypted based upon the data to create an encrypted key. The encrypted data is transmitted to the remote data store, along with metadata comprising the hashed data, the hashed encrypted data, the probabilistic data structure, and the encrypted key. The metadata may be used to implement deduplication for subsequent requests, to store data within the remote data store, with respect to the encrypted data.
Abstract:
A distributed storage system can use a high rate MSR erasure code to repair multiple nodes when multiple node failures occur. An encoder constructs m r-ary trees to determine the symbol arrays for the parity nodes. These symbol arrays are used to generate the parity data according to parity definitions or parity equations. The m r-ary trees are also used to identify a set of recovery rows across helper nodes for repairing a systematic node. When failed systematic nodes correspond to different ones of the m r-ary trees, a decoder may select additional recovery rows. The decoder selects additional recovery rows when the parity definitions do not provide a sufficient number of independent linear equations to solve the unknown symbols of the failed nodes. The decoder can select recovery rows contiguous to the already identified recovery rows for access efficiency.
Abstract:
Storage providers can securely store data and avoid data duplication with secure derivative data and offload the responsibility of generating the secure derivative data to the data owners. Initially, a data source will provide an encrypted version of data and the secure derivative data to a remote storage provider. The secure derivative data can include a hash of the data, a hash of the encrypted version of the data, a hash tree generated from the data, and an encrypted version of the key used to encrypt the data. When the remote storage provider later receives a request to store the same data, the remote storage provider uses the secure derivative data for secure proofs of storage and for proof of data possession.
Abstract:
Techniques are provided for deduplicating encrypted data. For example, a device has data to store in an encrypted state within a remote data store. A key is used to encrypt the data to create encrypted data. The data is hashed to create hashed data and the encrypted data is hashed to create hashed encrypted data. A probabilistic data structure of the data is generated. The key is encrypted based upon the data to create an encrypted key. The encrypted data is transmitted to the remote data store, along with metadata comprising the hashed data, the hashed encrypted data, the probabilistic data structure, and the encrypted key. The metadata may be used to implement deduplication for subsequent requests, to store data within the remote data store, with respect to the encrypted data.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within the distributed file system, create a secondary namespace within a local file system of a local host using the gathered metadata information and store the gathered metadata information as files within the secondary namespace. Further, when a request to create a PPI of the distributed file system is received, the technology can create a PPI of the secondary namespace using a PPI creation feature of the local file system.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within the distributed file system, create a secondary namespace within a local file system of a local host using the gathered metadata information and store the gathered metadata information as files within the secondary namespace. Further, when a request to create a PPI of the distributed file system is received, the technology can create a PPI of the secondary namespace using a PPI creation feature of the local file system.
Abstract:
Embodiments described herein provide an object store that efficiently manages and services objects for use by clients of a distributed data processing system. Illustratively, the object store may be embodied as a quasi-shared storage system that interacts with nodes of the distributed data processing system to service the objects as blocks of data stored on a plurality of storage devices, such as disks, of the storage system. To that end, an architecture of the object store may include an on-disk layout, e.g., of the storage system, and an incore layout, e.g., of the nodes, that cooperate to illustratively convert the blocks to objects for access by the clients.
Abstract:
Techniques are provided for deduplicating encrypted data. For example, a device has data to store in an encrypted state within a remote data store. A key is used to encrypt the data to create encrypted data. The data is hashed to create hashed data and the encrypted data is hashed to create hashed encrypted data. A probabilistic data structure of the data is generated. The key is encrypted based upon the data to create an encrypted key. The encrypted data is transmitted to the remote data store, along with metadata comprising the hashed data, the hashed encrypted data, the probabilistic data structure, and the encrypted key. The metadata may be used to implement deduplication for subsequent requests, to store data within the remote data store, with respect to the encrypted data.
Abstract:
Techniques are provided for deduplicating encrypted data. For example, a device has data to store in an encrypted state within a remote data store. A key is used to encrypt the data to create encrypted data. The data is hashed to create hashed data and the encrypted data is hashed to create hashed encrypted data. A probabilistic data structure of the data is generated. The key is encrypted based upon the data to create an encrypted key. The encrypted data is transmitted to the remote data store, along with metadata comprising the hashed data, the hashed encrypted data, the probabilistic data structure, and the encrypted key. The metadata may be used to implement deduplication for subsequent requests, to store data within the remote data store, with respect to the encrypted data.
Abstract:
Techniques are provided for aggregate inline deduplication and volume granularity encryption. For example, data that is exclusive to a volume of a tenant is encrypted using an exclusive encryption key accessible to the tenant. The exclusive encryption key of that tenant is inaccessible to other tenants. Shared data that has been deduplicated and shared between the volume and another volume of a different tenant is encrypted using a shared encryption key of the volume. The shared encryption key is made available to other tenants. In this way, data can be deduplicated across multiple volumes of different tenants of a storage environment, while maintaining security and data privacy at a volume level.