Technology for Web Site Crawling

发明申请

US20140149382A1 Technology for Web Site Crawling 有权

标题翻译：网站抓取技术

请登陆查看更多内容

专利标题： Technology for Web Site Crawling
专利标题（中）： 网站抓取技术
申请号： US14169115

申请日： 2014-01-30
公开(公告)号： US20140149382A1

公开(公告)日： 2014-05-29
发明人: Elizabeth A. Brodsky , Elmootazbellah N. Elnozahy , Ramakrishnan Rajamony
申请人： International Business Machines Corporation
申请人地址： US NY Armonk
专利权人： International Business Machines Corporation
当前专利权人： International Business Machines Corporation
当前专利权人地址： US NY Armonk
主分类号： G06F17/30
IPC分类号： G06F17/30

摘要：

A web site page has a reference for providing an address for a next page. The web site is crawled by a crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in a browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler. The crawler selects non-hypertext-link parameters from the web page of the web site server by performing a programmed action sequence, including selecting items from lists of the web page in a particular sequence. The crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence.

摘要（中）：

网站页面提供了下一页地址的参考。该网站被爬行程序抓取，该程序从一个网页解析引用，并将引用发送到在浏览器中运行的小程序。下一页的地址由浏览器根据引用确定，并发送到爬网程序。爬行器通过执行编程的动作序列从网站服务器的网页中选择非超文本链接参数，包括从特定顺序的网页列表中选择项目。爬行器发送在浏览器中运行的小程序，用于向Web服务器查询由一个网页引用的下一页，所选择的参数和从特定序列产生的上下文。

公开/授权文献

US09165077B2 Technology for web site crawling 公开/授权日：2015-10-20

信息查询

Global Dossier Espacenet