-
公开(公告)号:US20160267168A1
公开(公告)日:2016-09-15
申请号:US15033181
申请日:2013-12-19
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: George H. Forman , Renato Keshet
CPC classification number: G06F16/285 , G06F16/24578 , G06F16/90 , G06N20/00
Abstract: A technique for residual data identification can include receiving a plurality of data instances in a multi-class training data set that are d as belonging to recognized categories, receiving a plurality of data instances a first unlabeled data set, and receiving a plurality of data instances in a second unlabeled data set A technique for residual data identification can include labeling the plurality of data instances in the multi-class training data set as negative data instances. A technique for residual data identification can include labeling the plurality of data instances in the first unlabeled data set as positive data instances. A technique for residual data identification can include training a classifier with the labeled negative data instances and the labeled positive data instances. A technique for residual data identification can include applying the classifier to identify residual data instances in the second unlabeled data set.
Abstract translation: 用于残差数据识别的技术可以包括:将多个训练数据集中的多个数据实例接收为d属于所识别的类别,接收多个数据实例第一未标记的数据集,以及接收多个数据实例 在第二未标记数据集中用于残差数据识别的技术可以包括将多类训练数据集中的多个数据实例标记为负数据实例。 用于残差数据识别的技术可以包括将第一未标记数据集中的多个数据实例标记为正数据实例。 用于残差数据识别的技术可以包括用标记的负数据实例和标记的正数据实例来训练分类器。 用于残差数据识别的技术可以包括应用分类器来识别第二未标记数据集中的残留数据实例。