Patent search ap:("TATA CONSULTANCY SERVICES LIMITED") AND inv:"Hemant Kumar RATH" Page 3

21.

发明申请
SELF-LEARNING BASED CRAWLING AND RULE-BASED DATA MINING FOR AUTOMATIC INFORMATION EXTRACTION 审中-公开
Title translation: 基于自学习的基于自动信息提取的基于挖掘和规则的数据挖掘

公开(公告)号：US20160371603A1

公开(公告)日：2016-12-22

申请号：US15077563

申请日：2016-03-22

Applicant: Tata Consultancy Services Limited

Inventor： Arun Kumar A V , Hemant Kumar RATH , Shameemraj M. NADAF , Anantha SIMHA

IPC: G06N99/00 , G06N5/04 , G06F17/30

CPC classification number: G06N20/00 , G06F16/95 , G06F16/951 , G06N5/045

Abstract: Methods and Systems for automatic information extraction by performing self-learning crawling and rule-based data mining is provided. The method determines existence of crawl policy within input information and performs at least one of front-end crawling, assisted crawling and recursive crawling. Downloaded data set is pre-processed to remove noisy data and subjected to classification rules and decision tree based data mining to extract meaningful information. Performing crawling techniques leads to smaller relevant datasets pertaining to a specific domain from multi-dimensional datasets available in online and offline sources.

Abstract translation: 提供了通过执行自学习爬行和基于规则的数据挖掘自动信息提取的方法和系统。该方法确定输入信息中的爬网策略的存在，并执行前端抓取，辅助爬行和递归爬行中的至少一个。下载的数据集被预先处理，以去除噪声数据，并进行分类规则和基于决策树的数据挖掘，以提取有意义的信息。执行爬网技术会导致与在线和离线资源中提供的多维数据集相关的特定域的较小的相关数据集。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification