SELF-LEARNING BASED CRAWLING AND RULE-BASED DATA MINING FOR AUTOMATIC INFORMATION EXTRACTION
    21.
    发明申请
    SELF-LEARNING BASED CRAWLING AND RULE-BASED DATA MINING FOR AUTOMATIC INFORMATION EXTRACTION 审中-公开
    基于自学习的基于自动信息提取的基于挖掘和规则的数据挖掘

    公开(公告)号:US20160371603A1

    公开(公告)日:2016-12-22

    申请号:US15077563

    申请日:2016-03-22

    CPC classification number: G06N20/00 G06F16/95 G06F16/951 G06N5/045

    Abstract: Methods and Systems for automatic information extraction by performing self-learning crawling and rule-based data mining is provided. The method determines existence of crawl policy within input information and performs at least one of front-end crawling, assisted crawling and recursive crawling. Downloaded data set is pre-processed to remove noisy data and subjected to classification rules and decision tree based data mining to extract meaningful information. Performing crawling techniques leads to smaller relevant datasets pertaining to a specific domain from multi-dimensional datasets available in online and offline sources.

    Abstract translation: 提供了通过执行自学习爬行和基于规则的数据挖掘自动信息提取的方法和系统。 该方法确定输入信息中的爬网策略的存在,并执行前端抓取,辅助爬行和递归爬行中的至少一个。 下载的数据集被预先处理,以去除噪声数据,并进行分类规则和基于决策树的数据挖掘,以提取有意义的信息。 执行爬网技术会导致与在线和离线资源中提供的多维数据集相关的特定域的较小的相关数据集。

Patent Agency Ranking