-
公开(公告)号:US08204838B2
公开(公告)日:2012-06-19
申请号:US12421853
申请日:2009-04-10
申请人: Anton Schwaighofer , Joaquin Quiñonero Candela , Thomas Borchert , Thore Graepel , Ralf Herbrich
发明人: Anton Schwaighofer , Joaquin Quiñonero Candela , Thomas Borchert , Thore Graepel , Ralf Herbrich
IPC分类号: G06F15/18
CPC分类号: G06N99/005 , G06K9/6226
摘要: A scalable clustering system is described. In an embodiment the clustering system is operable for extremely large scale applications where millions of items having tens of millions of features are clustered. In an embodiment the clustering system uses a probabilistic cluster model which models uncertainty in the data set where the data set may be for example, advertisements which are subscribed to keywords, text documents containing text keywords, images having associated features or other items. In an embodiment the clustering system is used to generate additional features for associating with a given item. For example, additional keywords are suggested which an advertiser may like to subscribe to. The additional features that are generated have associated probability values which may be used to rank those features in some embodiments. User feedback about the generated features is received and used to revise the feature generation process in some examples.
摘要翻译: 描述了可扩展的集群系统。 在一个实施例中,聚类系统可操作用于具有数千万个特征的数百万个项目被聚集的极大规模应用。 在一个实施例中,聚类系统使用概率聚类模型,其对数据集中的不确定性进行建模,其中数据集可以是例如订阅关键字的广告,包含文本关键字的文本文档,具有相关联特征或其他项目的图像。 在一个实施例中,聚类系统用于产生用于与给定项目相关联的附加特征。 例如,建议广告客户可能希望订阅的其他关键字。 生成的附加特征具有相关联的概率值,其可用于在某些实施例中对这些特征进行排名。 在一些示例中,接收并用于用户对生成的特征的反馈以修改特征生成过程。
-
公开(公告)号:US20100262568A1
公开(公告)日:2010-10-14
申请号:US12421853
申请日:2009-04-10
申请人: Anton Schwaighofer , Joaquin Quinonero Candela , Thomas Borchert , Thore Graepel , Ralf Herbrich
发明人: Anton Schwaighofer , Joaquin Quinonero Candela , Thomas Borchert , Thore Graepel , Ralf Herbrich
CPC分类号: G06N99/005 , G06K9/6226
摘要: A scalable clustering system is described. In an embodiment the clustering system is operable for extremely large scale applications where millions of items having tens of millions of features are clustered. In an embodiment the clustering system uses a probabilistic cluster model which models uncertainty in the data set where the data set may be for example, advertisements which are subscribed to keywords, text documents containing text keywords, images having associated features or other items. In an embodiment the clustering system is used to generate additional features for associating with a given item. For example, additional keywords are suggested which an advertiser may like to subscribe to. The additional features that are generated have associated probability values which may be used to rank those features in some embodiments. User feedback about the generated features is received and used to revise the feature generation process in some examples.
摘要翻译: 描述了可扩展的集群系统。 在一个实施例中,聚类系统可操作用于具有数千万个特征的数百万个项目被聚集的极大规模应用。 在一个实施例中,聚类系统使用概率聚类模型,其对数据集中的不确定性进行建模,其中数据集可以是例如订阅关键字的广告,包含文本关键字的文本文档,具有相关联特征或其他项目的图像。 在一个实施例中,聚类系统用于产生用于与给定项目相关联的附加特征。 例如,建议广告客户可能希望订阅的其他关键字。 生成的附加特征具有相关联的概率值,其可用于在某些实施例中对这些特征进行排名。 在一些示例中,接收并用于用户对生成的特征的反馈以修改特征生成过程。
-