发明授权
US08639773B2 Discrepancy detection for web crawling 有权
网页爬网差异检测

Discrepancy detection for web crawling
摘要:
Search engines may utilize web crawlers to discover desirable content that may be provided to users as search results. Unfortunately, document providers, such as websites, may return junk web pages and/or maintenance web pages as document results, which may be undesirable for a search engine to provide as search results. Accordingly, document providers may be grouped into provider clusters. Profiles may be assigned to provider clusters, where a profile may comprise parameters representing “expected” parameters historically returned from normal document fetch operations to document providers within the provider cluster. Parameters of a profile for a provider cluster comprising a document provider may be compared with current document fetch parameters of a current document fetch operation. If the parameters of the profile and the current document fetch parameters do not match, then an alert may be generated.
公开/授权文献
信息查询
0/0