Cross-guided data clustering based on alignment between data domains
    1.
    发明授权
    Cross-guided data clustering based on alignment between data domains 有权
    基于数据域之间的对齐的交叉引导数据聚类

    公开(公告)号:US08589396B2

    公开(公告)日:2013-11-19

    申请号:US12652987

    申请日:2010-01-06

    IPC分类号: G06F17/30 G06F17/27

    CPC分类号: G06K9/6222 G06K9/6224

    摘要: A system and associated method for cross-guided data clustering by aligning target clusters in a target domain to source clusters in a source domain. The cross-guided clustering process takes the target domain and the source domain as inputs. A common word attribute shared by both the target domain and the source domain is a pivot vocabulary, and all other words in both domains are a non-pivot vocabulary. The non-pivot vocabulary is projected onto the pivot vocabulary to improve measurement of similarity between data items. Source centroids representing clusters in the source domain are created and projected to the pivot vocabulary. Target centroids representing clusters in the target domain are initially created by conventional clustering method and then repetitively aligned to converge with the source centroids by use of a cross-domain similarity graph that measures a respective similarity of each target centroid to each source centroid.

    摘要翻译: 一种用于通过将目标域中的目标集群与源域中的源集群进行对齐的交叉引导数据集群的系统和关联方法。 交叉引导的聚类过程将目标域和源域作为输入。 目标域和源域共享的通用字属性是一个枢轴词汇表,两个域中的所有其他单词都是一个非重要词汇。 非枢纽词汇被投影到枢纽词汇表上,以改进数据项之间相似度的测量。 源代码域中的聚类的源中心被创建并投影到枢纽词汇表。 目标域中的聚类的目标质心最初是通过传统聚类方法创建的,然后通过使用跨域相似度图重复对齐以与源中心收敛,该跨域相似度图测量每个目标质心与每个源质心的相应相似度。