摘要:
Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate, the heuristic computed by, for each one of the data classes, calculating an expected number of characters to be in a data class, calculating an expected number of characters that will not belong to a predefined set of the data classes, and calculating an actual number of the characters for each of the data classes and the non-classifiable data.
摘要:
B-Tree data is serialized to existing data for all types of workloads. The serialized B-Tree data, that has been split, sorted and classified into identified data ranges, is then compressed.
摘要:
Identification of data candidates for data processing is performed in real time by a processor device in a computing environment. Data candidates are sampled for performing a classification-based compression upon the data candidates. A heuristic is computed on a randomly selected data sample from the data candidate for determining if the data candidate may benefit from the classification-based compression. A decision is provided for approving the classification-based compression on the data candidates according to the heuristic.
摘要:
A detection learning module is used for enabling and/or disabling real-time compression detection by maintaining a history of real-time compression detection success for sampled data. The enabling or disabling of the real-time compression detection is based on a detection benefit function derived from a set of calculated heuristics indicating the real-time compression detection success on input streams.
摘要:
Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information.
摘要:
A detection learning module is used for enabling and/or disabling real-time compression detection by maintaining a history of real-time compression detection success for sampled data. The enabling or disabling of the real-time compression detection is based on a detection benefit function derived from a set of calculated heuristics indicating the real-time compression detection success on input streams. The detection benefit function is calculated based on at least one heuristic score.
摘要:
For real-time classification of data into data compression domains, a decision is made for which of the data compression domains write operations should be forwarded by reading randomly selected data of the write operations for computing a set of classifying heuristics thereby creating a fingerprint for each of the write operations. The write operations having a similar fingerprint are compressed together in a similar compression stream.
摘要:
B-Tree data is serialized to existing data for all types of workloads by converting a B-Tree data structure into a format capable of being stored and resurrected while containing all data stored in the B-Tree data structure and information relating to the B-Tree data structure.
摘要:
A Huffman cache is used to hold Huffman dictionaries that are changeable for dynamically selecting literal frequencies that are similar, wherein the Huffman cache is a data storage cache.
摘要:
B-Tree data is serialized to existing data for all types of workloads by converting a B-Tree data structure into a format capable of being stored and resurrected while containing all data stored in the B-Tree data structure and information relating to the B-Tree data structure. The serialized B-Tree data is divided into a plurality of sections. The serialized B-Tree data is stored into a plurality of buffers, where storing the B-Tree information section in a first binary buffer, the B-Tree key section in a second binary buffer, and the B-Tree data section in a third binary buffer. In the B-Tree data section, B-Tree data elements stored in the B-Tree data structure are saved, where a size of the B-Tree data section is equal to a total number of the B-Tree data elements in the B-Tree data structure multiplied by a size of each of the B-Tree data elements.