Similarity treeing for clustering data points

    公开(公告)号:US12072956B2

    公开(公告)日:2024-08-27

    申请号:US17819455

    申请日:2022-08-12

    申请人: AtomRain LLC

    摘要: Provided in this disclosure is a method of organizing data from an unordered, unclustered dataset including data points having attributes. The data points are sorted into neighborhoods each having categories based on the attributes. Similarity clusters are created based on a user-selected similarity cluster scope. The clusters are arranged into a similarity tree organized dataset where the clusters are grouped into a hierarchy of similarity levels. New data points are inserted by determining similarity of their attributes to data points in grouped clusters at the highest levels of the hierarchy, and then comparing to data points in grouped clusters at progressively lower levels until a suitable cluster is determined, whereupon the new data point is then inserted. In this manner, only a selected portion of data points are compared, representing an improvement over previous methods that require comparisons to all the data points in a dataset.

    SIMILARITY TREEING FOR CLUSTERING DATA POINTS

    公开(公告)号:US20240054186A1

    公开(公告)日:2024-02-15

    申请号:US17819455

    申请日:2022-08-12

    申请人: AtomRain LLC

    IPC分类号: G06K9/62 G06F16/28

    摘要: Provided in this disclosure is a method of organizing data from an unordered, unclustered dataset including data points having attributes. The data points are sorted into neighborhoods each having categories based on the attributes. Similarity clusters are created based on a user-selected similarity cluster scope. The clusters are arranged into a similarity tree organized dataset where the clusters are grouped into a hierarchy of similarity levels. New data points are inserted by determining similarity of their attributes to data points in grouped clusters at the highest levels of the hierarchy, and then comparing to data points in grouped clusters at progressively lower levels until a suitable cluster is determined, whereupon the new data point is then inserted. In this manner, only a selected portion of data points are compared, representing an improvement over previous methods that require comparisons to all the data points in a dataset.