METHOD AND APPARATUS FOR CRAWLING WEBPAGES
    3.
    发明申请
    METHOD AND APPARATUS FOR CRAWLING WEBPAGES 审中-公开
    用于克服边缘的方法和装置

    公开(公告)号:US20120102019A1

    公开(公告)日:2012-04-26

    申请号:US13116785

    申请日:2011-05-26

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: A method and apparatus for crawling webpages are provided. The method and apparatus involve obtaining a root Web address list; obtaining a list of Web addresses linked to the root Web address list; evaluating content of pages of the Web addresses based on the obtained list of Web addresses; adjusting a crawling depth according to the evaluation of the content of the pages of the Web addresses; and crawling webpages according to the adjusted crawling depth.

    摘要翻译: 提供了一种用于抓取网页的方法和装置。 所述方法和装置包括获得根Web地址列表; 获取链接到根Web地址列表的Web地址列表; 基于所获得的Web地址列表来评估Web地址的页面的内容; 根据Web地址页面内容的评估调整爬行深度; 并根据调整的爬行深度抓取网页。