Invention Application
- Patent Title: Index extraction from documents
- Patent Title (中): 从文件索引提取
-
Application No.: US10916878Application Date: 2004-08-12
-
Publication No.: US20060036649A1Publication Date: 2006-02-16
- Inventor: Steven Simske , David Wright
- Applicant: Steven Simske , David Wright
- Main IPC: G06F17/30
- IPC: G06F17/30

Abstract:
Systems, methods, and programs embodied in a computer readable medium are provided for index extraction. A plurality of ground truth documents are stored in a database, the ground truth documents being organized in a plurality of classifications. Attempts are made to automatically extract indices from a document based upon a classification associated with the document. The document is reclassified from a first one of the classifications to a second one of the classifications during the course of the automated extraction of the indices by drawing an association between the document and at least one of the ground truth documents. The indices are manually extracted from the document upon a failure to automatically extract the indices. The document is stored in the database as one of the ground truth documents if the indices are manually extracted.
Information query