-
公开(公告)号:US20160103842A1
公开(公告)日:2016-04-14
申请号:US14512893
申请日:2014-10-13
Applicant: GOOGLE INC.
Inventor: Krzysztof Marcin Choromanski , Sanjiv Kumar
IPC: G06F17/30
CPC classification number: G06K9/6255 , G06K9/622
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for clustering data points. One of the methods includes maintaining data representing a respective ordered tuple of skeleton data points for each of a plurality of clusters. One or more intersecting clusters are determined for a new data point. An updated tuple of skeleton data points is generated for an updated cluster by selecting updated skeleton data points, including selecting the new data point or an existing jth skeleton data point of one of the one or more intersecting clusters according to which random value, of the jth random value for the new data point or the random value for the jth existing skeleton data point, is closest to a limiting value. The new data point is then assigned to the updated cluster.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于聚类数据点。 方法之一包括维护表示多个聚类中的每一个的骨架数据点的相应有序元组的数据。 为新的数据点确定一个或多个相交的聚类。 通过选择更新的骨架数据点,为更新的簇生成更新的骨架数据点的元组,包括根据哪个随机值选择一个或多个相交簇中的一个或多个相交簇的新数据点或现有第j个骨架数据点 新数据点的第j个随机值或第j个现有骨架数据点的随机值最接近极限值。 然后将新数据点分配给更新的集群。
-
公开(公告)号:US09805290B2
公开(公告)日:2017-10-31
申请号:US14512893
申请日:2014-10-13
Applicant: GOOGLE INC.
Inventor: Krzysztof Marcin Choromanski , Sanjiv Kumar
IPC: G06K9/62
CPC classification number: G06K9/6255 , G06K9/622
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for clustering data points. One of the methods includes maintaining data representing a respective ordered tuple of skeleton data points for each of a plurality of clusters. One or more intersecting clusters are determined for a new data point. An updated tuple of skeleton data points is generated for an updated cluster by selecting updated skeleton data points, including selecting the new data point or an existing jth skeleton data point of one of the one or more intersecting clusters according to which random value, of the jth random value for the new data point or the random value for the jth existing skeleton data point, is closest to a limiting value. The new data point is then assigned to the updated cluster.
-
公开(公告)号:US10255323B1
公开(公告)日:2019-04-09
申请号:US14878357
申请日:2015-10-08
Applicant: GOOGLE INC.
Inventor: Ruiqi Guo , Sanjiv Kumar , Krzysztof Marcin Choromanski , David Morris Simcha
IPC: G06F17/30
Abstract: Implementations provide an improved system for efficiently calculating inner products between a query item and a database of items. An example method includes generating a plurality of subspaces from search items in a database, the search items being represented as vectors of elements, a subspace being a block of elements from each search item that occur at the same vector position, generating a codebook for each subspace within soft constraints that are based on example queries, assigning each subspace of each search item an entry in the codebook for the subspace, the assignments for all subspaces of a search item representing a quantized search item, and storing the codebooks and the quantized search items. Generating a codebook for a particular subspace can include clustering the search item subspaces that correspond to the particular subspace, finding a cluster center for each cluster, and storing the cluster center as the codebook entry.
-
-