发明授权
- 专利标题: Method to reduce I/O for hierarchical data partitioning methods
- 专利标题(中): 降低分层数据分区方法的I / O的方法
-
申请号: US884080申请日: 1997-06-27
-
公开(公告)号: US6055539A公开(公告)日: 2000-04-25
- 发明人: Vineet Singh , Anurag Srivastava
- 申请人: Vineet Singh , Anurag Srivastava
- 申请人地址: NY Armonk
- 专利权人: International Business Machines Corporation
- 当前专利权人: International Business Machines Corporation
- 当前专利权人地址: NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A method and system for generating a decision-tree classifier from a training set of records, independent of the system memory size. The method includes the steps of: generating an attribute list for each attribute of the records, sorting the attribute lists for numeric attributes, and generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, split points are evaluated to determine the best split test for partitioning the records at the node. Preferably, a gini index and class histograms are used in determining the best splits. The gini index indicates how well a split point separates the records while the class histograms reflect the class distribution of the records at the node. Also, a hash table is built as the attribute list of the split attribute is divided among the child nodes, which is then used for splitting the remaining attribute lists of the node. The method reduces I/O read time by combining the read for partitioning the records at a node with the read required for determining the best split test for the child nodes. Further, it requires writes of the records only at one out of n levels of the decision tree where n.gtoreq.2. Finally, a novel data layout on disk minimizes disk seek time. The I/O optimizations work in a general environment for hierarchical data partitioning. They also work in a multi-processor environment. After the generation of the decision tree, any prior art pruning methods may be used for pruning the tree.
公开/授权文献
- USD345464S Condom wallet 公开/授权日:1994-03-29
信息查询