Abstract:
A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.
Abstract:
Embodiments for optimizing dual-layered data compression in a storage environment. In a data storage system having a primary compressor and a secondary compressor, the primary compressor is selectively used to perform a first one of a plurality of actions on Input/Output (I/O) data while a second one of the plurality of actions is performed on the I/O data by the secondary compressor, thereby reducing latency and improving an overall compression performance while processing the I/O data.
Abstract:
Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.
Abstract:
A computer-implemented method includes dividing a data set into a plurality of regions and dividing the plurality of regions into a plurality of chunks of fixed size. The computer-implemented method further includes determining a sample size of the plurality of chunks to be sampled for each region, wherein the sample size is determined based, at least in part, on an acceptance of a likelihood of identifying at least one collision between two regions corresponding to logical entities of a first cluster of logical entities. The computer-implemented method further includes sampling the plurality of chunks for each region based on the determined sample size. The computer-implemented method further includes generating a hash value for each chunk sampled and storing each hash value in an index. The computer-implemented method further includes identifying one or more collisions between the plurality of regions. A corresponding computer system and computer program product are also disclosed.
Abstract:
Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate for determining if the data candidate may benefit from the classification-based compression. A decision is provided for approving the classification-based compression on the data candidates according to the heuristic.
Abstract:
Embodiments for optimizing dual-layered data compression in a storage environment. In a data storage system having a primary compressor implemented in a storage controller and a secondary compressor implemented within a drive-enclosure, the primary compressor is selectively used to perform a first one of a plurality of actions on input/output (I/O) data while a second one of the plurality of actions is performed on the I/O data by the secondary compressor, thereby reducing latency and improving an overall compression performance while processing the I/O data.
Abstract:
A computer-implemented method according to one embodiment includes determining resource usage of at least a first module in a grid storage system having multiple modules and approximately equal resource usage across the multiple modules of the grid storage system. The computer-implemented method further includes determining a garbage collection cost in the grid storage system by stopping garbage collection in a second of the modules of the grid storage system, determining a resource usage in the second module upon stopping the garbage collection, and comparing the resource usage in the second module to the resource usage of the at least the first module. The method further includes adjusting an amount of garbage collection based on both the garbage collection cost and the resource usage.
Abstract:
A computer-implemented method includes receiving a set of basis fingerprints corresponding to image chunks within a basis set of image regions wherein each image region within the basis set of image regions comprises one or more image chunks, and generating a fingerprint for each image chunk of a plurality of selected image chunks within an unprocessed region of a machine image to produce a plurality of sampled fingerprints. The method also includes determining a similarity metric for the unprocessed region from the sampled fingerprints and the basis fingerprints, comparing the similarity metric for the unprocessed region with a selected threshold, and including the unprocessed region within the basis set of image regions in response to determining that the similarity metric is less than the selected threshold. A corresponding computer program product and computer system are also disclosed herein.
Abstract:
In a compression processing storage system, using a pool of encryption processing cores, the encryption processing cores are assigned to process either encryption operations, decryption operations, and decryption and encryption operations, that are scheduled for processing. A maximum number of the encryption processing cores are set for processing only the decryption operations, thereby lowering a decryption latency. A minimal number of the encryption processing cores are allocated for processing the encryption operations, thereby increasing encryption latency. Upon reaching a throughput limit for the encryption operations that causes the minimal number of the plurality of encryption processing cores to reach a busy status, the minimal number of the plurality of encryption processing cores for processing the encryption operations is increased.
Abstract:
While discharging a data chunk, the chunk is compressed into a storage block. If the chunk is found to be too large to be able to be completely compressed into the storage block, certain characteristics of the data chunk are examined to determine whether or not the data chunk should be split. If the data chunk should be split, a remaining portion of the data chunk is compressed to a storage block that is next in chronological order to the original storage block. If the data chunk should not be split, all of the data chunk is moved to the next chronological storage block while leaving any remaining space in the storage block as unused.