摘要:
A system and method for maximizing data compression by optimizing model selection during coding of an input stream of data symbols, wherein at least two models are run and compared, and the model with the best coding performance for a given-size segment or block of compressed data is selected such that only its block is used in an output data stream. The best performance is determined by 1) respectively producing comparable-size blocks of compressed data from the input stream with the use of the two, or more, models and 2) selecting the model which compresses the most input data. In the preferred embodiment, respective strings of data are produced with each model from the symbol data and are coded with an adaptive arithmetic coder into the compressed data. Each block of compressed data is started by coding the decision to use the model currently being run and all models start with the arithmetic coder parameters established at the end of the preceding block. Only the compressed code stream of the best model is used in the output and that code stream has in it the overhead for selection of that model. Since the decision as to which model to run is made in the compressed data domain, i.e., the best model is chosen on the basis of which model coded the most input symbols for a given-size compressed block, rather than after coding a given number of input symbols, the model selection decision overhead scales with the compressed data. Successively selected compressed blocks are combined as an output code stream to produce an optimum output of compressed data, from input symbols, for storage or transmission.
摘要:
A method and means of arithmetic coding of conditional binary sources permitting instantaneous decoding and minimizing the number of encoding operations per iteration. A single shift and subtract operation for each encoding cycle can be achieved if an integer valued parameter representative of a probability interval embracing each source symbol relative frequency is used for string encoding and control. If the symbol being encoded is the most probable, then nothing is added to the arithmetic code string. However, an internal variable is updated by replacing it with an augend amount. If the updated internal variable has a leading zero, then both it and the code string are shifted left by one position. If the symbol being encoded is the least probable, then a computed augend is added to the code string and the code string is shifted by an amount equal to the integer valued parameter.
摘要:
A transmitter (11) is coupled via a transmission or storage medium (5) to a receiver (19). The transmitter encodes a sequence of source symbols k in accordance with an arithmetic coding technique, whereby roughly, digital numbers representing the successive source symbols are successively added at the low order end of the previously developed sum to develop a data string. The possibility of rippling of carries to the higher orders of the data string, which would prevent any part of the data string from being transferred until all the source symbols had been received, is prevented by the insertion of a control character in the data string, where n consecutive characters of the same kind occur. The data string can thus be transferred in high to low order sequence, in sections before the data string has been completely formed. The receiver decodes the data string by correcting appropriately for the control characters.
摘要:
Method and apparatus which cyclically generate a compressed, arithmetically-coded binary stream in response to binary occurrence counts of symbols in an uncoded string. The symbols in the uncoded string are drawn from a multi-character alphabet which is not necessarily a binary one. Coding operations and hardware are simplified by deriving from the binary occurrence counts an estimate of the probability of each unencoded symbol at its precise lexical location. The probability estimation eliminates any requirement for division or multiplication by employing magnitude-shifting of the binary occurrence counts. The encoded stream is augmented by the estimated symbol probability at the same time that an internal variable is updated with an estimate of the portion of a probability interval remaining after coding the current symbol, the interval estimate being obtained from the left-shifted occurrence counts. Decoding is the dual of encoding. The unencoded symbol stream is extracted, symbol-by-symbot, by substracting the estimated symbol probability that comes closest to, but does not exceed the magnitude of the compressed stream, re-estimating the symbol probabilities based upon the decoding, and testing the difference of the subtraction against the re-estimated probability.
摘要:
Disclosed is a method for assigning features to nodes of a tree structured classifier and for determining terminal nodes in response to a training set of objects, each of such objects being determined by a plurality of features. The method comprises the steps at each node of the tree of: (1) determining a selected characteristic, such as a cost function based on the minimum description length, of the plurality of features unused at prior nodes along the path from the root to the present node; (2) assigning a feature to the node having a preferred value for the selected characteristic relative to the other features; (3) creating child nodes in response to the assigned feature; (4) for each child node, determining the selected characteristic for the plurality of features unused at prior nodes and assigning a feature to the child node having a preferred value for the selected characteristic relative to the other features; (5) generating a combination of the values for the selected characteristics of the assigned features for the child nodes of the node; and (6) classifying the node as a terminal node in response to a comparison of the combination of values for the features assigned to the child nodes and the value for the feature assigned to the node.
摘要:
Method and apparatus which cyclically generate a compressed, arithmetically-coded binary stream in response to binary occurrence counts of symbols in an uncoded string. The symbols in the uncoded string are drawn from a multi-character alphabet which is not necessarily a binary one. Coding operations and hardware are simplified by deriving from the binary occurrence counts an estimate of the probability of each unencoded symbol at its precise lexical location. The probability estimation eliminates any requirement for division or multiplication by employing magnitude-shifting of the binary occurrence counts. The encoded stream is augmented by the estimated symbol probability at the same time that an internal variable is updated with an estimate of the portion of a probability interval remaining after coding the current symbol, the interval estimate being obtained from the left-shifted occurrence counts. Decoding is the dual of encoding. The unencoded symbol stream is extracted, symbol-by-symbot, by substracting the estimated symbol probability that comes closest to, but does not exceed the magnitude of the compressed stream, re-estimating the symbol probabilities based upon the decoding, and testing the difference of the subtraction against the re-estimated probability.
摘要:
Data translation apparatus incorporates a two-stage adaptive single modeling approach using a state generating model structure unit (1 or 83) feeding a parameter generating unit which is associated with an encoder or decoder. Both units are adaptive, the model structure unit developing a reference context from a base state by up to a predetermined number of additional directed context states from inputs presented for translation. A count'state table 35 is accessed for each symbol presented, the count being incremented and the first so many counts to reach a threshold value having their associated states incremented. Runs of more than a preset number of indentical symbols are separately detected and so signalled, the raw state superceding the table generated state.
摘要:
A method and means of arithmetic coding of conditional binary sources permitting instantaneous decoding and minimizing the number of encoding operations per iteration. A single shift and subtract operation for each encoding cycle can be achieved if an integer valued parameter representative of a probability interval embracing each source symbol relative frequency is used for string encoding and control. If the symbol being encoded is the most probable, then nothing is added to the arithmetic code string. However, an internal variable is updated by replacing it with an augend amount. If the updated internal variable has a leading zero, then both it and the code string are shifted left by one position. If the symbol being encoded is the least probable, then a computed augend is added to the code string and the code string is shifted by an amount equal to the integer valued parameter.