- 专利标题: Apparatus, method, and computer program for analyzing document layout
-
申请号: US11175127申请日: 2005-07-05
-
公开(公告)号: US20060204096A1公开(公告)日: 2006-09-14
- 发明人: Hiroaki Takebe , Katsuhito Fujimoto , Satoshi Naoi
- 申请人: Hiroaki Takebe , Katsuhito Fujimoto , Satoshi Naoi
- 专利权人: FUJITSU LIMITED
- 当前专利权人: FUJITSU LIMITED
- 优先权: JP2005-061529 20050304
- 主分类号: G06K9/34
- IPC分类号: G06K9/34
摘要:
A document layout analysis program capable of extracting an appropriate set of text blocks from a given document image even in the case where the document layout is so complicated that conventional extraction methods with a single extraction condition would not work well. A plurality of different extraction conditions are stored in an extraction condition memory for use in extracting text blocks from a given document image. In accordance with those extraction conditions, a text block extractor extracts a plurality of sets of text blocks from the document image. A text block consolidator produces a consolidated set of text blocks by performing character recognition on each extracted text block, evaluating validity of each text block based on a result of the character recognition, and selecting most valid text blocks from among the plurality of sets of text blocks.
公开/授权文献
信息查询