发明授权
US08037403B2 Apparatus, method, and computer program product for extracting structured document
有权
用于提取结构化文档的装置,方法和计算机程序产品
- 专利标题: Apparatus, method, and computer program product for extracting structured document
- 专利标题(中): 用于提取结构化文档的装置,方法和计算机程序产品
-
申请号: US11622216申请日: 2007-01-11
-
公开(公告)号: US08037403B2公开(公告)日: 2011-10-11
- 发明人: Takahiro Kawamura , Masumi Inaba , Shinichi Nagano , Tetsuo Hasegawa
- 申请人: Takahiro Kawamura , Masumi Inaba , Shinichi Nagano , Tetsuo Hasegawa
- 申请人地址: JP Tokyo
- 专利权人: Kabushiki Kaisha Toshiba
- 当前专利权人: Kabushiki Kaisha Toshiba
- 当前专利权人地址: JP Tokyo
- 代理机构: Turocy & Watson, LLP
- 优先权: JP2006-006443 20060113
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
An apparatus for retrieving a structured document including a first specifying unit that specifies a plurality of object documents from a plurality of structured documents being accessible via a network, the object document being the structured document according to retrieval condition; a first extracting unit that extracts text included in the object document; a second extracting unit that extracts metadata appended to the object document, the metadata being first data concerning the text of the object document and second data indicating a link relation between the object document and the structured documents; and a first calculating unit that calculates importance of each of the object documents, based on the text and the metadata of each of the object documents.
公开/授权文献
信息查询