一种基于特征选择改进的LR-Bagging算法

Invention Publication

CN106251241A 一种基于特征选择改进的LR-Bagging算法无效 - 驳回

Please log in to see more content

Patent Title: 一种基于特征选择改进的LR-Bagging算法
Patent Title (English): Improved LR-Bagging algorithm based on characteristic selection
Application No.: CN201610623647.9

Application Date: 2016-08-02
Publication No.: CN106251241A

Publication Date: 2016-12-21
Inventor: 吴漾 , 朱州 , 谭驰 , 曾路 , 王鹏宇 , 王玮 , 罗念华 , 吴忠 , 张克贤 , 郭仁超 , 杨箴 , 方继宇 , 龙娜 , 钱俊凤 , 王倩冰 , 陆岫昶
Applicant: 贵州电网有限责任公司信息中心
Applicant Address: 贵州省贵阳市瑞金南路38号
Assignee: 贵州电网有限责任公司信息中心
Current Assignee: 贵州电网有限责任公司信息中心
Current Assignee Address: 贵州省贵阳市瑞金南路38号
Agency: 贵阳中新专利商标事务所
Agent 李亮
Main IPC: G06Q50/06
IPC: G06Q50/06

Abstract:

本发明公开了一种基于特征选择改进的LR-Bagging算法，包括以下步骤：首先从原始数据中确定初始数据集，要求自变量与因变量的相关程度不能过低；其次，对初始数据集中的离散型自变量进行WEO编码；然后利用随机抽样获得一定数目的记录和特征字段组成训练例，将训练例进行LR((LogisticRegression)模型训练并做系数的正态显著性检验，若不显著，则剔除,反之，加入组合模型。进行循环迭代，直到组合模型较优。最后，则可以采用较优组合模型做预测与分群。该算法可提升分类结果的多样性，变量信息的提取度与预测结果的准确率，也能有效减少基LR模型由于变量过多而导致多重共线与“过拟合”的可能性。

Abstract(English):

The invention discloses an improved LR-Bagging algorithm based on characteristic selection. The improved LR-Bagging algorithm comprises the following steps: first of all, determining an initial data set from original data, wherein a degree of correlation between independent variables and dependent variables is required not to be excessively low; next, carrying out WEO encoding on discrete independent variables in the initial data set; then, obtaining a certain number of records and characteristic fields by employing random sampling to form training examples; carrying out LR (Logistic Regression) model training on the training examples and carrying out normal significance testing on coefficients; if the coefficients are not significant, eliminating the coefficients; or otherwise, adding the coefficients to a combined model; carrying out loop iteration till the combined model is relatively optimal; and at last, carrying out prediction and grouping by adopting the relatively optimal combined model. According to the algorithm, the diversity of classification results, the extracting degree of variable information and the accuracy rate of a prediction result can be improved; and the possibilities of multicollinearity and ''overfitting'' due to excessive variables of the LR model also can be effectively reduced.

Information query

Chinese Patent Announcement Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06Q	专门适用于行政、商业、金融、管理、监督或预测目的的数据处理系统或方法；其他类目不包含的专门适用于行政、商业、金融、管理、监督或预测目的的处理系统或方法
G06Q50/00	特别适用于特定商业行业的系统或方法，例如公用事业或旅游（医疗信息学入G16H）
G06Q50/06	.电力、天然气或水供应