发明授权
- 专利标题: Text representation method and apparatus
- 专利标题(中): 文本表示方法和装置
-
申请号: US11950883申请日: 2007-12-05
-
公开(公告)号: US08086040B2公开(公告)日: 2011-12-27
- 发明人: Zhigang Fan , Francis K. Tse
- 申请人: Zhigang Fan , Francis K. Tse
- 申请人地址: US CT Norwalk
- 专利权人: Xerox Corporation
- 当前专利权人: Xerox Corporation
- 当前专利权人地址: US CT Norwalk
- 代理机构: Oliff & Berridge, PLC
- 主分类号: G06K9/00
- IPC分类号: G06K9/00
摘要:
A text-like data representation technique and a text-like data representation apparatus are disclosed that may: acquire image data from a scanned image; segment text regions from the image data; further extract each connected component in the text regions; form clusters based on the connected components; group each connected component in the text regions into one of the clusters with similar or identical characters; generate a high-resolution representative for each cluster; generate a vector representation for each high-resolution representative; and code the text as text data by associating each connected component with its vectorized high-resolution representative, and location in the document.
公开/授权文献
- US20090148042A1 TEXT REPRESENTATION METHOD AND APPARATUS 公开/授权日:2009-06-11
信息查询