-
公开(公告)号:US08990145B2
公开(公告)日:2015-03-24
申请号:US13214105
申请日:2011-08-19
摘要: A first data mining model and a second data mining model are compared. A first data mining model M1 represents results of a first data mining task on a first data set D1 and provides a set of first prediction values. A second data mining model M2 represents results of a second data mining task on a second data set D2 and provides a set of second prediction values. A relation R is determined between said sets of prediction values. For at least a first record of an input data set, a first and second probability distribution is created based on the first and second data mining models applied to the first record. A distance measure d is calculated for said first record using the first and second probability distributions and the relation. At least one region of interest is determined based on said distance measure d.
摘要翻译: 比较了第一个数据挖掘模型和第二个数据挖掘模型。 第一数据挖掘模型M1表示第一数据集D1上的第一数据挖掘任务的结果,并提供一组第一预测值。 第二数据挖掘模型M2表示第二数据集D2上的第二数据挖掘任务的结果,并提供一组第二预测值。 在所述预测值组之间确定关系R. 对于输入数据集的至少第一记录,基于应用于第一记录的第一和第二数据挖掘模型来创建第一和第二概率分布。 使用第一和第二概率分布以及关系针对所述第一记录计算距离度量d。 基于所述距离测量d确定至少一个感兴趣区域。
-
公开(公告)号:US08738549B2
公开(公告)日:2014-05-27
申请号:US13214097
申请日:2011-08-19
IPC分类号: G06N5/00
CPC分类号: G06N7/005 , G06F17/18 , G06K9/6256 , G06K9/6277
摘要: A predictive analysis generates a predictive model (Padj(Y|X)) based on two separate pieces of information, a set of original training data (Dorig), and a “true” distribution of indicators (Ptrue(X)). The predictive analysis begins by generating a base model distribution (Pgen(Y|X)) from the original training data set (Dorig) containing tuples (x,y) of indicators (x) and corresponding labels (y). Using the “true” distribution (Ptrue(X)) of indicators, a random data set (D′) of indicator records (x) is generated reflecting this “true” distribution (Ptrue(X)). Subsequently, the base model (Pgen(Y|X)) is applied to said random data set (D′), thus assigning a label (y) or a distribution of labels to each indicator record (x) in said random data set (D′) and generating an adjusted training set (Dadj). Finally, an adjusted predictive model (Padj(Y|X)) is trained based on said adjusted training set (Dadj).
摘要翻译: 预测分析基于两个单独的信息,一组原始训练数据(Dorig)和“真实”指标分布(Ptrue(X))生成预测模型(Padj(Y | X))。 预测分析从包含指示符(x)和相应标签(y)的元组(x,y)的原始训练数据集(Dorig)生成基本模型分布(Pgen(Y | X))开始。 使用指示符的“真”分布(Ptrue(X)),产生反映该“真”分布(Ptrue(X))的指示符记录(x)的随机数据集(D')。 随后,将基本模型(Pgen(Y | X))应用于所述随机数据集(D'),从而将标签(y)或标签分布分配给所述随机数据集中的每个指示符记录(x) D')并生成调整训练集(Dadj)。 最后,基于所述调整训练集(Dadj)来训练调整后的预测模型(Padj(Y | X))。
-