发明授权
US08417783B1 System and method for improving feature selection for a spam filtering model 有权
用于改进垃圾邮件过滤模型的特征选择的系统和方法

System and method for improving feature selection for a spam filtering model
摘要:
A system and method for removing ineffective features from a spam feature set. In particular, in one embodiment of the invention, the an entropy value is calculated for the feature set based on the effectiveness of the feature set at differentiating between ham and spam. Features are then removed one at a time and the entropy is recalculated. Features which increase the overall entropy are removed and features which decrease the overall entropy are retained. In another embodiment of the invention, the value of certain type of time consuming features (e.g., rules) is determined based on both the information gain associated with the features and the time consumed implementing the features. Those features which have relatively low information gain and which consume a significant amount of time to implement are removed from the feature set.
信息查询
0/0