Invention Grant
- Patent Title: Text correction for PDF converters
- Patent Title (中): PDF转换器的文本校正
-
Application No.: US11219496Application Date: 2005-09-02
-
Publication No.: US07827484B2Publication Date: 2010-11-02
- Inventor: Hervé Déjean , André Kempe
- Applicant: Hervé Déjean , André Kempe
- Applicant Address: US CT Norwalk
- Assignee: Xerox Corporation
- Current Assignee: Xerox Corporation
- Current Assignee Address: US CT Norwalk
- Agency: Fay Sharpe LLP
- Main IPC: G06F17/00
- IPC: G06F17/00

Abstract:
To correct at least one extraneous or missing space in a document, weights are assigned to tokens contained in a dictionary. Each token is defined by an ordered sequence of non-space symbols. The weights are assigned based on at least one of a token length and frequency of occurrence of the token in the document. Corrected text is generated from text of the document by applying an ordered sequence of symbol-level transformations selected from a group of symbol-level transformations including at least (i) deleting a space, (ii) inserting a space, and (iii) copying a symbol. The ordered sequence of symbol-level transformations is optimized respective to an objective function dependent upon the weights of tokens of the corrected text.
Public/Granted literature
- US20070055933A1 Text correction for PDF converters Public/Granted day:2007-03-08
Information query