Abstract:
In order to reduce write tail latency, a storage system generates redundant write requests when performing a storage operation for an object. The storage operation is determined to be effectively complete when a minimum number of write requests have completed. For example, the storage system may generate twelve write requests and also generate four redundant write requests for a total of sixteen write requests. The storage system considers the object successfully stored once twelve of the sixteen writes complete successfully. To generate the redundant writes, the storage system may use replication or erasure coding. For replication, the storage system may issue a redundant write request for each of n chunks being written. For erasure coding, the storage system may use rateless codes which can generate unlimited number of parity chunks or use an n+k+k′ erasure code which generates an additional k′ encoded chunks, in place of an n+k erasure code.
Abstract:
Methods, non-transitory machine readable media, and computing devices that compare a hash value to a predefined value for sliding windows in parallel for segments partitioned from an input data stream. A bit array is parsed according to minimum and maximum chunk sizes to identify chunk boundaries for the input data stream. The bit array is populated based on a result of the comparison and portions of the bit array are parsed in parallel. Unique chunks of the input data stream defined by the chunk boundaries are stored in a storage device. Accordingly, this technology utilizes parallel processing in two stages. In a first stage, rolling window based hashing is performed concurrently to identify potential chunk boundaries. In a second stage, actual chunk boundaries are selected based on minimum and maximum chunk size constraints. This technology advantageously facilitates significant deduplication ratio improvement as well as improved parallel chunking performance.
Abstract:
In order to reduce write tail latency, a storage system generates redundant write requests when performing a storage operation for an object. The storage operation is determined to be effectively complete when a minimum number of write requests have completed. For example, the storage system may generate twelve write requests and also generate four redundant write requests for a total of sixteen write requests. The storage system considers the object successfully stored once twelve of the sixteen writes complete successfully. To generate the redundant writes, the storage system may use replication or erasure coding. For replication, the storage system may issue a redundant write request for each of n chunks being written. For erasure coding, the storage system may use rateless codes which can generate unlimited number of parity chunks or use an n+k+k' erasure code which generates an additional k encoded chunks, in place of an n+k erasure code.
Abstract:
Methods, non-transitory machine readable media, and computing devices that provide improved dictionary-based compression are disclosed. With this technology, a first portion of an input data stream is compressed using a first dictionary. A second dictionary is trained when the first dictionary is determined to be stale. The dictionary can be determined to be stale based on a size of the input data stream compressed using the first dictionary or a compression ratio decreasing by a threshold, for example. The first dictionary can be stored with metadata associated with the compressed first portion of the input data stream. Accordingly, this technology improves compression ratios, eliminates the need for reference counting, and facilitates improved reclamation of orphan dictionaries, among other advantages.
Abstract:
Methods, non-transitory machine readable media, and computing devices that provide improved dictionary-based compression are disclosed. With this technology, a first portion of an input data stream is compressed using a first dictionary. A second dictionary is trained when the first dictionary is determined to be stale. The dictionary can be determined to be stale based on a size of the input data stream compressed using the first dictionary or a compression ratio decreasing by a threshold, for example. The first dictionary can be stored with metadata associated with the compressed first portion of the input data stream. Accordingly, this technology improves compression ratios, eliminates the need for reference counting, and facilitates improved reclamation of orphan dictionaries, among other advantages.
Abstract:
Systems, devices, and methods are described for performing content-aware task assignment. A resource manager in a distributed computing system can identify tasks associated with a file. Each task can involve processing multiple data blocks of the file (e.g., in parallel with other processing by other tasks). The resource manager can provide block identifiers for the blocks to each of multiple computing nodes. Each computing node can store a respective subset of the blocks in a respective cache storage medium. Each subset of blocks stored at a node can be identified from the block identifiers. The resource manager can assign the task to a selected one of the computing nodes. The task can be assigned based on the selected computing node having larger subset of the blocks than one or more other computing nodes in the distributed computing system. In some embodiments, computing nodes can de-duplicate cached data using block identifiers.
Abstract:
Methods, non-transitory machine readable media, and computing devices that provide improved dictionary-based compression are disclosed. With this technology, a first portion of an input data stream is compressed using a first dictionary. A second dictionary is trained when the first dictionary is determined to be stale. The dictionary can be determined to be stale based on a size of the input data stream compressed using the first dictionary or a compression ratio decreasing by a threshold, for example. The first dictionary can be stored with metadata associated with the compressed first portion of the input data stream. Accordingly, this technology improves compression ratios, eliminates the need for reference counting, and facilitates improved reclamation of orphan dictionaries, among other advantages.
Abstract:
Methods, non-transitory machine readable media, and computing devices that provide improved dictionary-based compression are disclosed. With this technology, a first portion of an input data stream is compressed using a first dictionary. A second dictionary is trained when the first dictionary is determined to be stale. The dictionary can be determined to be stale based on a size of the input data stream compressed using the first dictionary or a compression ratio decreasing by a threshold, for example. The first dictionary can be stored with metadata associated with the compressed first portion of the input data stream. Accordingly, this technology improves compression ratios, eliminates the need for reference counting, and facilitates improved reclamation of orphan dictionaries, among other advantages.