发明授权
- 专利标题: System and method for automatically identifying classified websites
- 专利标题(中): 自动识别分类网站的系统和方法
-
申请号: US13228337申请日: 2011-09-08
-
公开(公告)号: US08380693B1公开(公告)日: 2013-02-19
- 发明人: Cheng Xu , Gang Feng , Xin Li
- 申请人: Cheng Xu , Gang Feng , Xin Li
- 申请人地址: US CA Mountain View
- 专利权人: Google Inc.
- 当前专利权人: Google Inc.
- 当前专利权人地址: US CA Mountain View
- 代理机构: Morgan, Lewis & Bockius LLP
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Systems, methods, and computer readable storage mediums are provided to automatically identifying a classified website. A website is determined to be a candidate site based on a set of heuristics. From among pages constituting the candidate site one or more pages are determined to be listing page candidates and one or more pages are determined to be detail page candidates. Then a listing page score is determined using a listing page classifier. Similarly, a detail page score is determined using a detail page classifier. The listing page and detail page scores each indicate the likelihood that the pages are part of a classified website. A candidate site score is determined based in part on a combination of the listing page score and the detail page scores. Then when the candidate site score is above a threshold the candidate site is determined to be a classified website.