-
公开(公告)号:US06563952B1
公开(公告)日:2003-05-13
申请号:US09420252
申请日:1999-10-18
申请人: Anurag Srivastava , G. D. Ramkumar , Vineet Singh , Sanjay Ranka
发明人: Anurag Srivastava , G. D. Ramkumar , Vineet Singh , Sanjay Ranka
IPC分类号: G06K962
CPC分类号: G06K9/6276
摘要: The present invention is an apparatus and method for classifying high-dimensional sparse datasets. A raw data training set is flattened by converting it from categorical representation to a boolean representation. The flattened data is then used to build a class model on which new data not in the training set may be classified. In one embodiment, the class model takes the form of a decision tree, and large itemsets and cluster information are used as attributes for classification. In another embodiment, the class model is based on the nearest neighbors of the data to be classified. An advantage of the invention is that, by flattening the data, classification accuracy is increased by eliminating artificial ordering induced on the attributes. Another advantage is that the use of large itemsets and clustering increases classification accuracy.
摘要翻译: 本发明是用于对高维稀疏数据集进行分类的装置和方法。 原始数据训练集通过将其从分类表示转换为布尔表示而被平坦化。 然后,使用平坦化的数据来构建一个类别模型,在该类模型中,不在训练集中的新数据可以被分类。 在一个实施例中,类模型采用决策树的形式,并且使用大的项目集和集群信息作为分类的属性。 在另一个实施例中,类模型基于要分类的数据的最近邻。 本发明的优点在于,通过平坦化数据,通过消除对属性引起的人为排序来增加分类精度。 另一个优点是使用大项集和聚类提高了分类精度。
-
公开(公告)号:US06185559B2
公开(公告)日:2001-02-06
申请号:US08853757
申请日:1997-05-09
申请人: Sergey Brin , G D Ramkumar , Shalom Tsur
发明人: Sergey Brin , G D Ramkumar , Shalom Tsur
IPC分类号: G06F1730
CPC分类号: G06F17/30539 , G06F2216/03 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99943 , Y10S707/99945
摘要: The present invention is directed to a data mining method and apparatus that dynamically initiates the counting of sets of items (itemsets) at any time during the pass over the records of a database and terminates the counting at the same location in the next pass. In this manner, the present invention begins to count itemsets early and finishes counting early while keeping the number of different itemsets which are being counted in any pass relatively low.
摘要翻译: 本发明涉及一种数据挖掘方法和装置,其在通过数据库的记录之前的任何时间动态地启动项目集(项集)的计数,并在下一遍中终止在同一位置的计数。 以这种方式,本发明开始早期计数项目,并且在保持以任何通过计数的不同项目集的数量相对较低的情况下提前计数。
-
公开(公告)号:US06173280B2
公开(公告)日:2001-01-09
申请号:US09065837
申请日:1998-04-24
申请人: G D Ramkumar , Sanjay Ranka , Shalom Tsur
发明人: G D Ramkumar , Sanjay Ranka , Shalom Tsur
IPC分类号: G06F1730
CPC分类号: G06F17/30539 , G06F2216/03 , Y10S707/99936 , Y10S707/99943
摘要: The present invention discloses a data mining method and apparatus that assigns weight values to items and/or transactions based on the value to the user, thereby resulting in association rules of greater importance. A conservative method, aggressive method, or a combination of the two can be used when generating supersets.
摘要翻译: 本发明公开了一种数据挖掘方法和装置,该方法和装置根据该用户的价值向物品和/或交易分配权重值,从而产生更重要的关联规则。 当产生超集时,可以使用保守的方法,积极的方法或两者的组合。
-
-