摘要:
A method for operating a data storage system that is comprised of at least one disk. The method includes a step of partitioning a data compression unit into n basic compression units, where n is greater than one. Each data compression unit is comprised of a plurality of disk sectors and each of n basic compression units begins with a different initial sector. A next step partitions the data compression unit into X intervals, where X is less than n. Each of the X intervals begins with a different initial sector and within one of the n basic compression units. Further steps of the method include storing, external to the at least one disk, a plurality of first pointers each of which points to the initial sector of one of the n basic compression units wherein the 1/X, 2/X, . . . , (X-1)/X intervals begin; storing, within each of the initial sectors of the individual ones of the n basic compression units wherein the 1/X, 2/X, . . . , (X-1)/X intervals begin, a second pointer to the initial sector of the interval that begins within the basic compression unit; and, in response to a disk read operation that reads a compressed data unit that begins with one of the sectors that is located within one of the X intervals, accessing the beginning sector of the compressed data unit in accordance with one of the first pointers and one of the second pointers. The step of allocating includes a step of determining an actual compression ratio for the record, determining a value of a longest run of identical characters within the record, and adjusting the actual compression ratio based on the value of the longest run of identical characters. The step of allocating also includes a step of adding at least one additional sector to the estimated number to enable the compressed record to be subsequently updated in place.
摘要:
Aspects for caching storage data include partitioning a storage cache to include a compressed data partition and an uncompressed data partition, and adjusting a size of the compressed data partition and the uncompressed data partition for chosen performance characteristics. A data caching system aspect in a data processing system having a host system in communication with a storage system includes at least one storage device and at least one partially compressed cache. The at least one partially compressed cache further includes an uncompressed partition and a compressed partition, where the compressed partition stores at least a victim data unit from the uncompressed partition.
摘要:
A desired cache size in a disk drive is established, and no reordering algorithm is performed on commands in the cache until the desired size is reached. An optimal subset size is also established. Then, an optimization algorithm is performed on all commands in the cache, with only the commands in the optimal subset being output for execution. The cache is refilled to the desired size, and the process is repeated.
摘要:
A method and means for detecting and correcting anomalies in a RAM-based FPGA by comparing CRC residues over portions of the RAM-stored connection bitmap with prestored residues derived from uncorrupted copies of the same bitmap portions. A mismatch selectively invokes either error reporting to the chip only, error reporting and immediate verification testing of counterpart FPGA chip functions, or error reporting, parity-based correction of the words in error, reprogramming of the chip functions with the corrected words, and verification testing.
摘要:
A method and apparatus in which on-chip functions are checked and any detected anomalies are located within a nested time interval. An on-chip function is tested by (1) applying a predetermined data pattern to the function, (2) computing a linear block error detection code residue from any output from the function being tested, and (3) comparing the residue to a error code residue (signature) derived from the output of a copy of the same function with the same data pattern. In one embodiment, the code signature has been previously derived from an error-free copy of the function. Where the signature is supplied contemporaneously by another copy of the same function also being tested, the function copy is not presumed error free. In both cases, any mismatch between the on-chip code residue and the signature indicates error, erasure, or fault. By either recursive reprocessing or shortening the intervals between comparisons, the mismatch can be located within a nested time or sequence interval.
摘要:
A method and apparatus for compressing and decompressing data is described. The most frequent symbols (A-Group.) are encoded using an Arithmetic Code, then the remainder of the symbols (H-Group) are first encoded using Huffman's algorithm (or any Prefix code) and then combined with the Arithmetic code resulting in a hybrid Arithmetic/Huffman code. After being encoded into a Huffman code, the H-Group symbols are made into a "super-symbol" which fits into an Arithmetic subinterval allocated to the symbols in the H-Group. The Arithmetic subintervals for the symbols in the H-Group preferably are a negative power of 2 (e.g., 1/2, 1/4, 1/16, 1/32, etc.) of the code space. Each such H-group subinterval has its own associated subset of H-group symbols comprising one Huffman code table that fits into that respective interval. Decoding in an AMSAC system first treats the code stream as Arithmetically encoded. Standard prior art Arithmetic decoding is performed until an interval assigned to the super-symbol(s) is encountered. The Arithmetic super-symbol for this interval is then processed to obtain the Prefix code by reversing the scaling and offsetting, if any, that was needed to fit the super-symbol into the assigned Arithmetic subinterval. The Prefix code is then decoded into the original symbol using standard prior art Prefix techniques.
摘要:
A system for compressing digital data at one byte-per-cycle throughput by removing redundancy before storage or transmission. The system includes an improved Ziv-Lempel LZ1 process that uses a history buffer to save the most recent source string symbols for use in encoding the source symbols as "match-length" and "match-offset" tokens. The match-length code symbols are selected from two groups of buckets that are assigned variable-length prefixes for the shorter, more probable match-lengths and a fixed-length prefix code for the longer, less probable match-lengths. This exploits a transition from Laplacian match-length probability distribution to Uniform match-length probability distribution for longer match-lengths. The offset code field length is reduced during start-up to improve start-up compression efficiency during filling of the history buffer. The match-length code book is limited to a maximum value T
摘要:
A constant size storage can be managed to preserve locality of referencing where it is partitioned into linear addressable storage space for compressed symbol strings and a linked list addressable space for overflowing portions of each compressed string, a token to the overflow being embedded in the linear address. The linear space is readjusted periodically in a direction so as to maintain the amount of available overflow within to lie within a certain range of current usage. Changes in compression statistics result in changing overflow usage requiring readjustment to minimize internal fragmentation etc.
摘要:
A method and means for ascertaining maximal length pattern matches of K characters per cycle between character strings in a reduced amount of time using a pipeline like concurrent dataflow model in which a recursive exhaustive greedy comparison matching between the strings in consistent direction yields a parsing of the longest matches, the recursion being constrained by relations among K, the match length L, and a tracking variable J, said constraints governing further recursions ascertaining prefix extensions from one string to another and any intra-string pattern matches. Embodiments processing K equal to one, two, or three characters at a time are disclosed.