发明授权
- 专利标题: Discrepancy detection for web crawling
- 专利标题(中): 网页爬网差异检测
-
申请号: US12817797申请日: 2010-06-17
-
公开(公告)号: US08639773B2公开(公告)日: 2014-01-28
- 发明人: Balaji B. Shyamkumar , Puneet Sahni , Harsh Verma
- 申请人: Balaji B. Shyamkumar , Puneet Sahni , Harsh Verma
- 申请人地址: US WA Redmond
- 专利权人: Microsoft Corporation
- 当前专利权人: Microsoft Corporation
- 当前专利权人地址: US WA Redmond
- 代理机构: Microsoft Corporation
- 主分类号: G06F15/16
- IPC分类号: G06F15/16
摘要:
Search engines may utilize web crawlers to discover desirable content that may be provided to users as search results. Unfortunately, document providers, such as websites, may return junk web pages and/or maintenance web pages as document results, which may be undesirable for a search engine to provide as search results. Accordingly, document providers may be grouped into provider clusters. Profiles may be assigned to provider clusters, where a profile may comprise parameters representing “expected” parameters historically returned from normal document fetch operations to document providers within the provider cluster. Parameters of a profile for a provider cluster comprising a document provider may be compared with current document fetch parameters of a current document fetch operation. If the parameters of the profile and the current document fetch parameters do not match, then an alert may be generated.
公开/授权文献
- US20110314122A1 DISCREPANCY DETECTION FOR WEB CRAWLING 公开/授权日:2011-12-22
信息查询