发明授权
US07013309B2 Method and apparatus for extracting anchorable information units from complex PDF documents
失效
从复杂PDF文档中提取可锚定信息单元的方法和装置
- 专利标题: Method and apparatus for extracting anchorable information units from complex PDF documents
- 专利标题(中): 从复杂PDF文档中提取可锚定信息单元的方法和装置
-
申请号: US09996271申请日: 2001-11-28
-
公开(公告)号: US07013309B2公开(公告)日: 2006-03-14
- 发明人: Amit Chakraborty , Liang H. Hsu
- 申请人: Amit Chakraborty , Liang H. Hsu
- 申请人地址: US NJ Princeton
- 专利权人: Siemens Corporate Research
- 当前专利权人: Siemens Corporate Research
- 当前专利权人地址: US NJ Princeton
- 主分类号: G06F16/30
- IPC分类号: G06F16/30 ; G06F17/21
摘要:
A method for extracting Anchorable Information Units (AIUs), from a Portable Document Format (PDF) file, which may either be created using either an editor or by scanning in documents. The method includes parsing the portable document format document into textual portions and non-text portions, and extracting structure from the textual portions and the non-text portions. The method further includes determining text within textual portions, and text the non-text portions, and hyperlinking a plurality of keywords within the textual portions and non-text portions to a related document.
公开/授权文献
信息查询