Data Mining Unlearnable Data Sets
    1.
    发明申请
    Data Mining Unlearnable Data Sets 审中-公开
    数据挖掘不可靠的数据集

    公开(公告)号:US20080027886A1

    公开(公告)日:2008-01-31

    申请号:US11572193

    申请日:2005-07-18

    IPC分类号: G06G7/00

    摘要: This invention concerns data mining, that is the extraction of information, from “unlearnable” data sets. In particular it concerns apparatus and a method for this purpose. The invention involves creating a finite training sample from the data set (14). Then training (50) a learning device (32) using a supervised learning algorithm to predict labels for each item of the training sample. Then processing other data from the data set with the trained learning device to predict labels and determining whether the predicted labels are better (learnable) or worse (anti-learnable) than random guessing (52). And, using a reverser (34) to apply negative weighting to the predicted labels if it is worse (anti-learnable) (54).

    摘要翻译: 本发明涉及数据挖掘,即从“不可理解”的数据集中提取信息。 特别地,它涉及用于此目的的装置和方法。 本发明涉及从数据集(14)创建有限训练样本。 然后使用监督学习算法训练(50)学习装置(32)来预测训练样本的每个项目的标签。 然后利用训练有素的学习装置处理来自数据集的其他数据,以预测标签,并确定预测标签是否比随机猜测更好(可学习)或更差(可反学习)(52)。 并且,如果反转(34)更糟(反学习),则使用反向器(34)对预测标签应用负权重(54)。