Data mining method and system for generating a decision tree classifier
for data records based on a minimum description length (MDL) and
presorting of records
    1.
    发明授权
    Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records 失效
    基于最小描述长度(MDL)和分段记录生成用于数据记录的决策树分类器的数据挖掘方法和系统

    公开(公告)号:US5787274A

    公开(公告)日:1998-07-28

    申请号:US564694

    申请日:1995-11-29

    IPC分类号: G06F17/30

    摘要: A method and apparatus are disclosed for generating a decision tree classifier from a training set of records. The method comprises the steps of: pre-sorting the records based on each numeric record attribute, creating a decision tree breadth-first, and pruning the tree based on the MDL principle. Preferably, the pre-sorting includes generating a class list and attribute lists, and independently sorting the numeric attribute lists. The growing of the tree includes evaluating possible splitting criteria and selecting a splitting test for each leaf node, based on a splitting index, and updating the class list to reflect new leaf nodes. In a preferred embodiment, the splitting index is a gini index. The pruning preferably includes encoding the decision tree and splitting tests in an MDL-based code, and determining whether to convert a node into a leaf node, prune its child nodes, or leave the node intact, based on the code length of the node.

    摘要翻译: 公开了一种从记录训练集合生成决策树分类器的方法和装置。 该方法包括以下步骤:基于每个数值记录属性对记录进行预排序,创建决策树宽度优先,并根据MDL原理修剪树。 优选地,预排序包括生成类列表和属性列表,并且独立地排序数字属性列表。 树的增长包括基于分割索引来评估可能的分割标准并为每个叶节点选择分裂测试,并且更新类列表以反映新的叶节点。 在优选实施例中,分割索引是基尼系数。 修剪优选地包括对基于MDL的代码中的决策树进行编码和分割测试,并且基于节点的代码长度来确定是否将节点转换为叶节点,修剪其子节点或使节点保持不变。

    Distributed coding and prediction by use of contexts
    2.
    发明授权
    Distributed coding and prediction by use of contexts 失效
    通过使用上下文进行分布式编码和预测

    公开(公告)号:US5652581A

    公开(公告)日:1997-07-29

    申请号:US691903

    申请日:1996-07-31

    CPC分类号: H03M7/30

    摘要: The present invention comprises a distributed data processing system including a plurality of data processing elements for expeditiously performing an encoding or prediction function pursuant to a context-based model in an adaptive, optimal and time-progressive manner. The distributed data processing system, having access to each symbol of an input data string at each clock cycle, adaptively generates context-relevant data sets which provide the best model for coding or prediction based on the input symbols. Each symbol and its best model for encoding or prediction emerge concurrently from the system, resulting in a favorable time complexity of .omicron.(n) for an n-symbol input data string.

    摘要翻译: 本发明包括一种分布式数据处理系统,其包括多个数据处理元件,用于以自适应,最优和时间渐进的方式根据基于上下文的模型快速执行编码或预测功能。 分布式数据处理系统能够以每个时钟周期访问输入数据串的每个符号,自适应地生成基于输入符号为编码或预测提供最佳模型的上下文相关数据集。 每个符号及其用于编码或预测的最佳模型从系统同时出现,导致对于n符号输入数据串的omicron(n)的有利的时间复杂度。

    Quantization method for image data compression employing context
modeling algorithm
    3.
    发明授权
    Quantization method for image data compression employing context modeling algorithm 失效
    使用上下文建模算法的图像数据压缩的量化方法

    公开(公告)号:US5640159A

    公开(公告)日:1997-06-17

    申请号:US643201

    申请日:1996-05-06

    摘要: A method, system, and manufacture are provided, for use in connection with data processing and compression, for quantizing a string of data values, such as image data pixel values. The quantization is achieved by grouping the data values, based on their values, into a predetermined number of categories, each category containing the same total number of values. For each category, a value, preferably a mean value of those in the category, is selected as a quantization value. All of the data values in the category arc then represented by the selected quantization value. For data strings having a dependency (that is, the values of one or more of the data values provide information about values of other of the data values), the dependency is modeled by a method in which a modeling algorithm defines contexts in terms of a tree structure, and the basic method of grouping into categories and selecting a quantization value for each category is performed on a per node (i.e., per context) basis.

    摘要翻译: 提供了与数据处理和压缩结合使用的方法,系统和制造,用于量化诸如图像数据像素值的一串数据值。 通过将数据值基于它们的值分组成预定数量的类别来实现量化,每个类别包含相同的总数值。 对于每个类别,选择值,优选类别中的值的平均值作为量化值。 类别中的所有数据值都由所选择的量化值表示。 对于具有依赖性(即,一个或多个数据值的值提供关于其他数据值的值的信息)的数据串,依赖关系由其中建模算法定义上下文的方法建模 在每个节点(即每个上下文)的基础上执行对于每个类别的分类和选择量化值的基本方法。