AUTOMATIC GENERATION OF TRAINING DATA FOR HAND-PRINTED TEXT RECOGNITION

    公开(公告)号:US20230206674A1

    公开(公告)日:2023-06-29

    申请号:US17562344

    申请日:2021-12-27

    发明人: Jason James Grams

    摘要: A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.

    Automatic generation of training data for hand-printed text recognition

    公开(公告)号:US11715317B1

    公开(公告)日:2023-08-01

    申请号:US17562344

    申请日:2021-12-27

    发明人: Jason James Grams

    摘要: A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.