发明授权
- 专利标题: Document processing apparatus, document processing method, document processing program and recording medium
- 专利标题(中): 文件处理装置,文件处理方法,文件处理程序和记录介质
-
申请号: US12005924申请日: 2007-12-28
-
公开(公告)号: US07984076B2公开(公告)日: 2011-07-19
- 发明人: Kenichiro Kobayashi , Makoto Akabane , Tomoaki Nitta , Nobuhide Yamazaki , Erika Kobayashi
- 申请人: Kenichiro Kobayashi , Makoto Akabane , Tomoaki Nitta , Nobuhide Yamazaki , Erika Kobayashi
- 申请人地址: JP Tokyo
- 专利权人: Sony Corporation
- 当前专利权人: Sony Corporation
- 当前专利权人地址: JP Tokyo
- 代理机构: Wolf, Greenfield & Sacks, P.C.
- 优先权: JP2001-140778 20010510
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
The text format of input data is checked, and is converted into a system-manipulated format. It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like. The converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns. Each block section is tagged with a tag indicating a block. The data divided into blocks is parsed based on tags, character patterns, etc., and is structured. A table in text is also parsed, and is segmented into cells. Finally, tree-structured data having a hierarchical structure is generated based on the sentence-structured data. A sentence-extraction template paired with the tree-structured data is used to extract sentences.
公开/授权文献
信息查询