发明授权
- 专利标题: Index extraction from documents
- 专利标题(中): 从文件索引提取
-
申请号: US10916877申请日: 2004-08-12
-
公开(公告)号: US08805803B2公开(公告)日: 2014-08-12
- 发明人: Steven J. Simske , David W. Wright
- 申请人: Steven J. Simske , David W. Wright
- 申请人地址: US TX Houston
- 专利权人: Hewlett-Packard Development Company, L.P.
- 当前专利权人: Hewlett-Packard Development Company, L.P.
- 当前专利权人地址: US TX Houston
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
Systems, methods, and programs embodied in a computer readable medium are provided for index extraction. Stored in a database are ground truth documents that are organized according to a plurality of classifications, each classification having a group of predefined indices. A document to be indexed is classified by drawing an association between the document and one of the classifications. An attempt is made to extract from the document at least a subset of the group of predefined indices associated with the one of the classifications. Upon a failure to extract the subset of the group of predefined indices, attempts are made to find and correct at least one text recognition error in the document based upon a salient dictionary associated with the one of the classifications.
公开/授权文献
- US20060036614A1 Index extraction from documents 公开/授权日:2006-02-16
信息查询