摘要:
The present invention comprises a distributed data processing system including a plurality of data processing elements for expeditiously performing an encoding or prediction function pursuant to a context-based model in an adaptive, optimal and time-progressive manner. The distributed data processing system, having access to each symbol of an input data string at each clock cycle, adaptively generates context-relevant data sets which provide the best model for coding or prediction based on the input symbols. Each symbol and its best model for encoding or prediction emerge concurrently from the system, resulting in a favorable time complexity of .omicron.(n) for an n-symbol input data string.
摘要:
A method, system, and manufacture are provided, for use in connection with data processing and compression, for quantizing a string of data values, such as image data pixel values. The quantization is achieved by grouping the data values, based on their values, into a predetermined number of categories, each category containing the same total number of values. For each category, a value, preferably a mean value of those in the category, is selected as a quantization value. All of the data values in the category arc then represented by the selected quantization value. For data strings having a dependency (that is, the values of one or more of the data values provide information about values of other of the data values), the dependency is modeled by a method in which a modeling algorithm defines contexts in terms of a tree structure, and the basic method of grouping into categories and selecting a quantization value for each category is performed on a per node (i.e., per context) basis.
摘要:
A method and apparatus are disclosed for generating a decision tree classifier from a training set of records. The method comprises the steps of: pre-sorting the records based on each numeric record attribute, creating a decision tree breadth-first, and pruning the tree based on the MDL principle. Preferably, the pre-sorting includes generating a class list and attribute lists, and independently sorting the numeric attribute lists. The growing of the tree includes evaluating possible splitting criteria and selecting a splitting test for each leaf node, based on a splitting index, and updating the class list to reflect new leaf nodes. In a preferred embodiment, the splitting index is a gini index. The pruning preferably includes encoding the decision tree and splitting tests in an MDL-based code, and determining whether to convert a node into a leaf node, prune its child nodes, or leave the node intact, based on the code length of the node.