发明申请
- 专利标题: INFORMATION COLLECTION APPARATUS, SEARCH ENGINE, INFORMATION COLLECTION METHOD, AND PROGRAM
- 专利标题(中): 信息收集设备,搜索引擎,信息收集方法和程序
-
申请号: US13003875申请日: 2009-08-14
-
公开(公告)号: US20110119263A1公开(公告)日: 2011-05-19
- 发明人: Seiji Hamada , Makoto Yamamoto
- 申请人: Seiji Hamada , Makoto Yamamoto
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 优先权: JP2008261848 20081008
- 国际申请: PCT/JP2009/064362 WO 20090814
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
The present invention provides an information collection apparatus, an information collection method, and a program capable of collecting information from information resources on a network effectively as well as a search engine that searches the information resources collected. An information collection apparatus of the present invention that collects information from information resources on a network includes an extraction unit that acquires data from an information resource via the network to extract a link-destination address included in the data, a calculation unit that calculates, by comparing each link-destination address with a collection rule describing a set of addresses qualified for a collection target, a score for each link-destination address that reflects a distance from the set to a link-destination information resource indicated by the link-destination address, and a judgment unit that judges whether the link-destination information resource is to be included in the collection target or not in accordance with the score calculated for the link-destination information resource.
公开/授权文献
信息查询