发明申请
US20100306204A1 DETECTING DUPLICATE DOCUMENTS USING CLASSIFICATION 失效
使用分类检测重复文件

DETECTING DUPLICATE DOCUMENTS USING CLASSIFICATION
摘要:
Systems, methods and articles of manufacture are disclosed for detecting a duplicate document. A plurality of documents may be assigned to categories, each category corresponding to a collection of duplicates, or near duplicate documents. A new document may be received. The new document may be evaluated against each category to determine a similarity score between the new document and each category. The new document may be identified as a duplicate based on the similarity scores and thresholds for each category. An action may then be performed on the duplicate based on duplication rules. The thresholds and duplication rules may be customized by a user.
公开/授权文献
信息查询
0/0