发明授权
- 专利标题: Method and system for clustering identified forms
- 专利标题(中): 聚类识别形式的方法和系统
-
申请号: US12032331申请日: 2008-02-15
-
公开(公告)号: US07996390B2公开(公告)日: 2011-08-09
- 发明人: Juliana Freire , Luciano Barbosa
- 申请人: Juliana Freire , Luciano Barbosa
- 申请人地址: US UT Salt Lake City
- 专利权人: The University of Utah Research Foundation
- 当前专利权人: The University of Utah Research Foundation
- 当前专利权人地址: US UT Salt Lake City
- 代理机构: Bell & Manning, LLC
- 主分类号: G06F7/00
- IPC分类号: G06F7/00 ; G06F17/30
摘要:
A method is provided for organizing a plurality of documents that include forms. An initial set of clusters is defined for the plurality of documents. The initial set of clusters is reclustered based on similarity values calculated in multiple feature spaces. For example, a first feature space may be associated with a content of a document while a second feature space may be associated with a content of a form associated with the document. Each cluster has an associated centroid vector in each feature space that is used to represent the cluster. The similarity between the document and each cluster is calculated in both feature spaces. Each document is assigned to the cluster whose centroid is most similar. The cluster centroids may be recalculated and the process repeated until the cluster assignments become stable.
公开/授权文献
- US20090210406A1 METHOD AND SYSTEM FOR CLUSTERING IDENTIFIED FORMS 公开/授权日:2009-08-20
信息查询