METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS

发明申请

US20170124413A1 METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS 有权

请登陆查看更多内容

专利标题： METHODS AND APPARATUS TO EXTRACT TEXT FROM IMAGED DOCUMENTS
申请号： US14927014

申请日： 2015-10-29
公开(公告)号： US20170124413A1

公开(公告)日： 2017-05-04
发明人: Kevin Keqiang Deng
申请人： The Nielsen Company (US), LLC
主分类号： G06K9/34
IPC分类号： G06K9/34 ; G06K9/64 ; G06K9/03 ; G06K9/62

摘要：

Methods and apparatus to extract text from imaged documents are disclosed. Example methods include segmenting an image of a document into localized sub-images corresponding to individual characters in the document. The example methods further include grouping respective ones of the sub-images into a cluster based on a visual correlation of the respective ones of the sub-images to a reference sub-image. The visual correlation between the reference sub-image and the respective ones of the sub-images grouped into the cluster exceeding a correlation threshold. The example methods also include identifying a designated character for the cluster based on the sub-images grouped into the cluster. The example methods further include associating the designated character with locations in the image of the document associated with the respective ones of the sub-images grouped into the cluster.

公开/授权文献

US09684842B2 Methods and apparatus to extract text from imaged documents 公开/授权日：2017-06-20

信息查询

Global Dossier Espacenet