- 专利标题: SYSTEMS AND METHODS FOR AUTOMATIC CLUSTERING AND CANONICAL DESIGNATION OF RELATED DATA IN VARIOUS DATA STRUCTURES
-
申请号: US16189040申请日: 2018-11-13
-
公开(公告)号: US20190079937A1公开(公告)日: 2019-03-14
- 发明人: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
- 申请人: Palantir Technologies Inc.
- 主分类号: G06F17/30
- IPC分类号: G06F17/30
摘要:
Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
信息查询