METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS
摘要:
Methods and apparatus to extract text from imaged documents are disclosed. Example methods include segmenting an image of a document into localized sub-images corresponding to individual characters in the document. The example methods further include grouping respective ones of the sub-images into a cluster based on a visual correlation of the respective ones of the sub-images to a reference sub-image. The visual correlation between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold. The example methods also include identifying a designated character for the cluster based on the sub-images grouped into the cluster. The example methods further include associating the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster.
公开/授权文献
信息查询
0/0