-
1.
公开(公告)号:US20240320227A1
公开(公告)日:2024-09-26
申请号:US18731699
申请日:2024-06-03
Applicant: Palantir Technologies Inc.
Inventor: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
IPC: G06F16/2457 , G06F16/28 , G06F16/35 , G06F16/9535 , G06F18/23
CPC classification number: G06F16/24578 , G06F16/285 , G06F16/35 , G06F16/9535 , G06F18/23
Abstract: Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
-
公开(公告)号:US12038933B2
公开(公告)日:2024-07-16
申请号:US18325616
申请日:2023-05-30
Applicant: Palantir Technologies Inc.
Inventor: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
IPC: G06F16/00 , G06F16/2457 , G06F16/28 , G06F16/35 , G06F16/9535 , G06F18/23
CPC classification number: G06F16/24578 , G06F16/285 , G06F16/35 , G06F16/9535 , G06F18/23
Abstract: Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
-
公开(公告)号:US10783148B2
公开(公告)日:2020-09-22
申请号:US16357108
申请日:2019-03-18
Applicant: Palantir Technologies Inc.
Inventor: James Shuster , Daniel Fernandez
Abstract: In an embodiment, a data processing method comprises creating and storing a plurality of analytical notebooks in digital computer storage, wherein each of the analytical notebooks comprises notebook metadata that specifies a kernel for execution, and one or more computational cells, wherein each of the cells comprises cell metadata, a source code reference and an output reference; receiving, in association with a first cell among the one or more cells, first input specifying computer program source code of a function, wherein the function defines an input dataset, a transformation, and one or more variables associated with output data; storing the first cell, excluding the output data, using a first digital data storage system and updating the source code reference to identify the first data storage system; using the kernel specified in the notebook metadata, executing an executable version of the source code to result in generating the output data; storing the output data using a second digital data storage system that is separate from the first digital data storage system and updating the output reference to identify the second data storage system.
-
公开(公告)号:US10002163B2
公开(公告)日:2018-06-19
申请号:US15673231
申请日:2017-08-09
Applicant: PALANTIR TECHNOLOGIES INC.
Inventor: James Shuster , Daniel Fernandez
CPC classification number: G06F16/24568 , G06F8/20 , G06F8/313 , G06F8/33 , G06F8/38 , G06F9/30043
Abstract: In an embodiment, a data processing method comprises creating and storing a plurality of analytical notebooks in digital computer storage, wherein each of the analytical notebooks comprises notebook metadata that specifies a kernel for execution, and one or more computational cells, wherein each of the cells comprises cell metadata, a source code reference and an output reference; receiving, in association with a first cell among the one or more cells, first input specifying computer program source code of a function, wherein the function defines an input dataset, a transformation, and one or more variables associated with output data; storing the first cell, excluding the output data, using a first digital data storage system and updating the source code reference to identify the first data storage system; using the kernel specified in the notebook metadata, executing an executable version of the source code to result in generating the output data; storing the output data using a second digital data storage system that is separate from the first digital data storage system and updating the output reference to identify the second data storage system.
-
公开(公告)号:US20220374454A1
公开(公告)日:2022-11-24
申请号:US17812984
申请日:2022-07-15
Applicant: Palantir Technologies Inc.
Inventor: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
Abstract: Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
-
公开(公告)号:US11226967B2
公开(公告)日:2022-01-18
申请号:US16994276
申请日:2020-08-14
Applicant: Palantir Technologies Inc.
Inventor: James Shuster , Daniel Fernandez
Abstract: In an embodiment, a data processing method comprises creating and storing a plurality of analytical notebooks in digital computer storage, wherein each of the analytical notebooks comprises notebook metadata that specifies a kernel for execution, and one or more computational cells, wherein each of the cells comprises cell metadata, a source code reference and an output reference; receiving, in association with a first cell among the one or more cells, first input specifying computer program source code of a function, wherein the function defines an input dataset, a transformation, and one or more variables associated with output data; storing the first cell, excluding the output data, using a first digital data storage system and updating the source code reference to identify the first data storage system; using the kernel specified in the notebook metadata, executing an executable version of the source code to result in generating the output data; storing the output data using a second digital data storage system that is separate from the first digital data storage system and updating the output reference to identify the second data storage system.
-
7.
公开(公告)号:US20190079937A1
公开(公告)日:2019-03-14
申请号:US16189040
申请日:2018-11-13
Applicant: Palantir Technologies Inc.
Inventor: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
IPC: G06F17/30
Abstract: Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
-
公开(公告)号:US20200380001A1
公开(公告)日:2020-12-03
申请号:US16994276
申请日:2020-08-14
Applicant: Palantir Technologies Inc.
Inventor: James Shuster , Daniel Fernandez
Abstract: In an embodiment, a data processing method comprises creating and storing a plurality of analytical notebooks in digital computer storage, wherein each of the analytical notebooks comprises notebook metadata that specifies a kernel for execution, and one or more computational cells, wherein each of the cells comprises cell metadata, a source code reference and an output reference; receiving, in association with a first cell among the one or more cells, first input specifying computer program source code of a function, wherein the function defines an input dataset, a transformation, and one or more variables associated with output data; storing the first cell, excluding the output data, using a first digital data storage system and updating the source code reference to identify the first data storage system; using the kernel specified in the notebook metadata, executing an executable version of the source code to result in generating the output data; storing the output data using a second digital data storage system that is separate from the first digital data storage system and updating the output reference to identify the second data storage system.
-
公开(公告)号:US10282450B1
公开(公告)日:2019-05-07
申请号:US15980647
申请日:2018-05-15
Applicant: Palantir Technologies Inc.
Inventor: James Shuster , Daniel Fernandez
Abstract: In an embodiment, a data processing method comprises creating and storing a plurality of analytical notebooks in digital computer storage, wherein each of the analytical notebooks comprises notebook metadata that specifies a kernel for execution, and one or more computational cells, wherein each of the cells comprises cell metadata, a source code reference and an output reference; receiving, in association with a first cell among the one or more cells, first input specifying computer program source code of a function, wherein the function defines an input dataset, a transformation, and one or more variables associated with output data; storing the first cell, excluding the output data, using a first digital data storage system and updating the source code reference to identify the first data storage system; using the kernel specified in the notebook metadata, executing an executable version of the source code to result in generating the output data; storing the output data using a second digital data storage system that is separate from the first digital data storage system and updating the output reference to identify the second data storage system.
-
公开(公告)号:US10127289B2
公开(公告)日:2018-11-13
申请号:US15233149
申请日:2016-08-10
Applicant: Palantir Technologies Inc.
Inventor: Lawrence Manning , Rahul Mehta , Daniel Erenrich , Guillem Palou Visa , Roger Hu , Xavier Falco , Rowan Gilmore , Eli Bingham , Jason Prestinario , Yifei Huang , Daniel Fernandez , Jeremy Elser , Clayton Sader , Rahul Agarwal , Matthew Elkherj , Nicholas Latourette , Aleksandr Zamoshchin
IPC: G06F17/30
Abstract: Computer implemented systems and methods are disclosed for automatically clustering and canonically identifying related data in various data structures. Data structures may include a plurality of records, wherein each record is associated with a respective entity. In accordance with some embodiments, the systems and methods further comprise identifying clusters of records associated with a respective entity by grouping the records into pairs, analyzing the respective pairs to determine a probability that both members of the pair relate to a common entity, and identifying a cluster of overlapping pairs to generate a collection of records relating to a common entity. Clusters may further be analyzed to determine canonical names or other properties for the respective entities by analyzing record fields and identifying similarities.
-
-
-
-
-
-
-
-
-