- 专利标题: Training set construction for taxonomic classification
- 专利标题(中): 分类分类培训班
-
申请号: US12604025申请日: 2009-10-22
-
公开(公告)号: US08122005B1公开(公告)日: 2012-02-21
- 发明人: Philo Juang , Christopher Testa , Nicolaus Mote
- 申请人: Philo Juang , Christopher Testa , Nicolaus Mote
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Brake Hughes Bellermann LLP
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.
信息查询