Systems and methods for automatic clustering and canonical designation of related data in various data structures

发明授权

US11704325B2 Systems and methods for automatic clustering and canonical designation of related data in various data structures 有权

请登陆查看更多内容

专利标题： Systems and methods for automatic clustering and canonical designation of related data in various data structures
申请号： US17812984

申请日： 2022-07-15
公开(公告)号： US11704325B2

公开(公告)日： 2023-07-18
发明人: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
申请人： Palantir Technologies Inc.
申请人地址： US CA Palo Alto
专利权人： Palantir Technologies Inc.
当前专利权人： Palantir Technologies Inc.
当前专利权人地址： US CO Denver
代理机构： Knobbe, Martens, Olson & Bear, LLP
主分类号： G06F16/00
IPC分类号： G06F16/00 ; G06F16/2457 ; G06F16/35 ; G06F16/9535 ; G06F16/28 ; G06F18/23

Systems and methods for automatic clustering and canonical designation of related data in various data structures

摘要：

Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.

公开/授权文献

US20220374454A1 SYSTEMS AND METHODS FOR AUTOMATIC CLUSTERING AND CANONICAL DESIGNATION OF RELATED DATA IN VARIOUS DATA STRUCTURES 公开/授权日：2022-11-24

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F16/00	信息检索；数据库结构；文件系统结构