- 专利标题: METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS
-
申请号: US14927014申请日: 2015-10-29
-
公开(公告)号: US20170124413A1公开(公告)日: 2017-05-04
- 发明人: Kevin Keqiang Deng
- 申请人: The Nielsen Company (US), LLC
- 主分类号: G06K9/34
- IPC分类号: G06K9/34 ; G06K9/64 ; G06K9/03 ; G06K9/62
摘要:
Methods and apparatus to extract text from imaged documents are disclosed. Example methods include segmenting an image of a document into localized sub-images corresponding to individual characters in the document. The example methods further include grouping respective ones of the sub-images into a cluster based on a visual correlation of the respective ones of the sub-images to a reference sub-image. The visual correlation between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold. The example methods also include identifying a designated character for the cluster based on the sub-images grouped into the cluster. The example methods further include associating the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster.
公开/授权文献
- US09684842B2 Methods and apparatus to extract text from imaged documents 公开/授权日:2017-06-20
信息查询