发明申请
US20110087701A1 SYSTEM, METHOD, AND APPARATUS FOR PAIRING A SHORT DOCUMENT TO ANOTHER SHORT DOCUMENT FROM A PLURALITY OF SHORT DOCUMENTS
审中-公开
从少量短文件中将短文文件与另一份短文件配对的系统,方法和装置
- 专利标题: SYSTEM, METHOD, AND APPARATUS FOR PAIRING A SHORT DOCUMENT TO ANOTHER SHORT DOCUMENT FROM A PLURALITY OF SHORT DOCUMENTS
- 专利标题(中): 从少量短文件中将短文文件与另一份短文件配对的系统,方法和装置
-
申请号: US12576959申请日: 2009-10-09
-
公开(公告)号: US20110087701A1公开(公告)日: 2011-04-14
- 发明人: Greg Eyres , Vahit Hakan Hacigumus , Tobin J. Lehman , H. Raymond Strong, JR.
- 申请人: Greg Eyres , Vahit Hakan Hacigumus , Tobin J. Lehman , H. Raymond Strong, JR.
- 申请人地址: US NY Armonk
- 专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人: INTERNATIONAL BUSINESS MACHINES CORPORATION
- 当前专利权人地址: US NY Armonk
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
A computer-implemented method for pairing a new document to a document from a plurality of documents. Embodiments include, for each of the new document and the plurality of documents, generating a vector of terms of interest uniquely associated with a document of the new document and the plurality of documents. For each term of interest, an associated element value of the vector is assigned as zero if the term of interest does not occur in the document and one otherwise. The method also includes, for each document from the plurality of documents, determining a similarity between the vectors. The method also includes selecting a document from the plurality of documents as related to the new document if the similarity between the vector for the new document and the vector for the document from the plurality of documents is greater than or equal to a threshold value.
公开/授权文献
信息查询