Abstract:
A technique described herein performs peer to peer network write deduplication. A host system generates a fingerprint for data associated with a write request. The host system may then determine whether the generated fingerprint matches a local fingerprint stored in a local data structure or whether the generated fingerprint matches a global fingerprint associated with a global data structure, wherein the local fingerprint is associated with data previously written to the storage system by the host and wherein the global fingerprint is associated with data previously written to the storage system by a different host. If a match is found, the host system constructs a deduplication command utilizing a logical address corresponding to a storage location that stores the data. If a match is not found, a write command for the data of the write request is constructed and sent to the storage system.
Abstract:
A data management services architecture includes architectural components that run in both a storage and compute domains. The architectural components redirect storage requests from the storage domain to the compute domain, manage resources allocated from the compute domain, ensure compliance with a policy that governs resource consumption, deploy program code for data management services, dispatch service requests to deployed services, and monitor deployed services. The architectural components also include a service map to locate program code for data management services, and service instance information for monitoring deployed services and dispatching requests to deployed services. Since deployed services can be stateless or stateful, the services architecture also includes state data for the stateful services, with supporting resources that can expand or contract based on policy and/or service demand. The architectural components also include containers for the deployed services.
Abstract:
A deduplication service can be provided to a storage domain from a services framework that expands and contracts to both meet service demand and to conform to resource management of a compute domain. The deduplication service maintains a fingerprint database and reference count data in compute domain resources, but persists these into the storage domain for use in the case of a failure or interruption of the deduplication service in the compute domain. The deduplication service responds to service requests from the storage domain with indications of paths in a user namespace and whether or not a piece of data had a fingerprint match in the fingerprint database. The indication of a match guides the storage domain to either store the piece of data into the storage backend or to reference another piece of data. The deduplication service uses the fingerprints to define paths for corresponding pieces of data.
Abstract:
A data management services architecture includes architectural components that run in both a storage and compute domains. The architectural components redirect storage requests from the storage domain to the compute domain, manage resources allocated from the compute domain, ensure compliance with a policy that governs resource consumption, deploy program code for data management services, dispatch service requests to deployed services, and monitor deployed services. The architectural components also include a service map to locate program code for data management services, and service instance information for monitoring deployed services and dispatching requests to deployed services. Since deployed services can be stateless or stateful, the services architecture also includes state data for the stateful services, with supporting resources that can expand or contract based on policy and/or service demand. The architectural components also include containers for the deployed services.
Abstract:
In order to reduce write tail latency, a storage system generates redundant write requests when performing a storage operation for an object. The storage operation is determined to be effectively complete when a minimum number of write requests have completed. For example, the storage system may generate twelve write requests and also generate four redundant write requests for a total of sixteen write requests. The storage system considers the object successfully stored once twelve of the sixteen writes complete successfully. To generate the redundant writes, the storage system may use replication or erasure coding. For replication, the storage system may issue a redundant write request for each of n chunks being written. For erasure coding, the storage system may use rateless codes which can generate unlimited number of parity chunks or use an n+k+k' erasure code which generates an additional k encoded chunks, in place of an n+k erasure code.
Abstract:
A technique described herein performs peer to peer network write deduplication. A host system generates a fingerprint for data associated with a write request. The host system may then determine whether the generated fingerprint matches a local fingerprint stored in a local data structure or whether the generated fingerprint matches a global fingerprint associated with a global data structure, wherein the local fingerprint is associated with data previously written to the storage system by the host and wherein the global fingerprint is associated with data previously written to the storage system by a different host. If a match is found, the host system constructs a deduplication command utilizing a logical address corresponding to a storage location that stores the data. If a match is not found, a write command for the data of the write request is constructed and sent to the storage system.