Invention Grant
US07379932B2 System and a method for focused re-crawling of Web sites 有权
系统和重点重新抓取网站的方法

System and a method for focused re-crawling of Web sites
Abstract:
A method (100) of crawling the Web (620) is disclosed. The method (100) crawls (120) Web pages on the Web starting from a given (110) set of seed Universal Resource Locators (URLs). Crawled Web pages are partitioned (140) into sets of relevant and irrelevant pages. A set of exclusion and/or inclusion patterns are discovered (150) from the sets of relevant and irrelevant pages, and subsequent crawling of the Web is restricted through the set of exclusion and/or inclusion patterns.
Public/Granted literature
Information query
Patent Agency Ranking
0/0