Ground truth generation from scanned documents
摘要:
A plurality of electronic documents comprising one or more document pages are received. First position markers, second position markers and page identifiers are inserted to the pages. The plurality of electronic documents are printed, thereby generating a printed corpus comprising a plurality of printed documents. The plurality of printed documents are scanned, thereby generating a scanned corpus comprising a plurality of scanned images. Scanning frame positions of the first and the second position markers are detected and the detected scanning frame positions and the page positions are used to define affine transformations between the plurality of scanned images and the corresponding document pages. The affine transformations are applied to the plurality of scanned images to align the plurality of scanned images with the corresponding document pages of the plurality of electronic documents.
公开/授权文献
信息查询
0/0