Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices

发明授权

US07536445B2 Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices 失效

标题翻译：启用网络抓取机器人从网站收集信息，将信息内容定制到访问设备的功能

请登陆查看更多内容

专利标题： Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices
专利标题（中）： 启用网络抓取机器人从网站收集信息，将信息内容定制到访问设备的功能
申请号： US10751767

申请日： 2004-01-05
公开(公告)号： US07536445B2

公开(公告)日： 2009-05-19
发明人: Takafumi Kinoshita
申请人： Takafumi Kinoshita
申请人地址： US NY Armonk
专利权人： International Business Machines Corporation
当前专利权人： International Business Machines Corporation
当前专利权人地址： US NY Armonk
代理机构： Hoffman Warnick LLC
代理商 Robert Straight
优先权： JP2003-047983 20030225
主分类号： G06F15/16
IPC分类号： G06F15/16

Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices

摘要：

A web-crawling robot retrieves information from a web server that tailors information content to the capability of an accessing device. A link deriving unit in a proxy server for relaying data exchanged between the robot and the site analyzes a response from the site to the robot, and acquires information on a user agent corresponding to a particular kind of content of a link destination. On the basis of the information, a user agent information editing unit in the proxy server adds user agent information to the content retrieval request from the web-crawling robot to the site so as to disguise it as a content retrieval request issued from a given user agent, thereby acquiring a response corresponding to capabilities of the user agent.

摘要（中）：

网络抓取机器人从Web服务器检索信息，根据访问设备的能力来定制信息内容。代理服务器中用于中继机器人与站点之间交换的数据的链接导出单元分析从站点到机器人的响应，并且获取与链接目的地的特定种类的内容相对应的用户代理的信息。基于该信息，代理服务器中的用户代理信息编辑单元将用户代理信息从网络抓取机器人向网站添加到内容检索请求，以将其伪装成从给定用户发出的内容检索请求代理，从而获得与用户代理的能力相对应的响应。

公开/授权文献

US20040205114A1 Enabling a web-crawling robot to collect information from web sites that tailor information content to the capabilities of accessing devices 公开/授权日：2004-10-14

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F15/00	通用数字计算机（零部件入G06F1/00至G06F13/00组）；通用数据处理设备
G06F15/16	.两个或多个数字计算机的组合，每台计算机至少具有一个运算单元、一个程序单元和一个寄存器，例如，用于数个程序的同时处理