Abstract:
One or more techniques and/or systems are provided for performing host side deduplication. Host side deduplication may be performed upon writeable data within a write request received at a host computing device configured to access data stored by a storage server. The host side deduplication may be performed at the host computing device to determine whether the writeable data is already stored by the storage server based upon querying a host side cache comprising data stored by a storage server and/or a data structure comprising unique signatures of data stored by the storage server. If the writeable data is stored by the storage server, then a deduplication notification excluding the writeable data may be sent to the storage server, otherwise a write command comprising the writeable data may be sent. Accordingly, unnecessary network traffic of redundant data already stored by the storage server may be reduced.
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to maintain cache coherency among multiple storage nodes. It can also be employed to avoid sending the data to a network node over a network if it already has the data.
Abstract:
The techniques introduced herein provide for systems and methods for estimating the effectiveness of utilizing a data deduplication process. More specifically, a content-based sampling approach for data deduplication estimation is described in which a subset of the scanned fingerprints of a dataset are included in a content-based sample that is used to determine an accurate deduplication estimate for a dataset (or volume).
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to determine whether a given storage server already has the data, and to avoid sending the data to that storage server over a network if it already has the data. It can also be employed to maintain cache coherency among multiple storage nodes.
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to determine whether a given storage server already has the data, and to avoid sending the data to that storage server over a network if it already has the data. It can also be employed to maintain cache coherency among multiple storage nodes.
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to determine whether a given storage server already has the data, and to avoid sending the data to that storage server over a network if it already has the data. It can also be employed to maintain cache coherency among multiple storage nodes.
Abstract:
One or more techniques and/or systems are provided for performing host side deduplication. Host side deduplication may be performed upon writeable data within a write request received at a host computing device configured to access data stored by a storage server. The host side deduplication may be performed at the host computing device to determine whether the writeable data is already stored by the storage server based upon querying a host side cache comprising data stored by a storage server and/or a data structure comprising unique signatures of data stored by the storage server. If the writeable data is stored by the storage server, then a deduplication notification excluding the writeable data may be sent to the storage server, otherwise a write command comprising the writeable data may be sent. Accordingly, unnecessary network traffic of redundant data already stored by the storage server may be reduced.
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to determine whether a given storage server already has the data, and to avoid sending the data to that storage server over a network if it already has the data. It can also be employed to maintain cache coherency among multiple storage nodes.
Abstract:
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to maintain cache coherency among multiple storage nodes. It can also be employed to avoid sending the data to a network node over a network if it already has the data.
Abstract:
One or more techniques and/or systems are provided for performing host side deduplication. Host side deduplication may be performed upon writeable data within a write request received at a host computing device configured to access data stored by a storage server. The host side deduplication may be performed at the host computing device to determine whether the writeable data is already stored by the storage server based upon querying a host side cache comprising data stored by a storage server and/or a data structure comprising unique signatures of data stored by the storage server. If the writeable data is stored by the storage server, then a deduplication notification excluding the writeable data may be sent to the storage server, otherwise a write command comprising the writeable data may be sent. Accordingly, unnecessary network traffic of redundant data already stored by the storage server may be reduced.