-
公开(公告)号:US20120102019A1
公开(公告)日:2012-04-26
申请号:US13116785
申请日:2011-05-26
申请人: Seung-hyun YOON , Seung-ryoul MAENG , Jae-hyuk HUH , Sang-won SEO , Jae-Hong KIM , Jong-se PARK
发明人: Seung-hyun YOON , Seung-ryoul MAENG , Jae-hyuk HUH , Sang-won SEO , Jae-Hong KIM , Jong-se PARK
IPC分类号: G06F17/30
CPC分类号: G06F16/951
摘要: A method and apparatus for crawling webpages are provided. The method and apparatus involve obtaining a root Web address list; obtaining a list of Web addresses linked to the root Web address list; evaluating content of pages of the Web addresses based on the obtained list of Web addresses; adjusting a crawling depth according to the evaluation of the content of the pages of the Web addresses; and crawling webpages according to the adjusted crawling depth.
摘要翻译: 提供了一种用于抓取网页的方法和装置。 所述方法和装置包括获得根Web地址列表; 获取链接到根Web地址列表的Web地址列表; 基于所获得的Web地址列表来评估Web地址的页面的内容; 根据Web地址页面内容的评估调整爬行深度; 并根据调整的爬行深度抓取网页。