-
公开(公告)号:US12229189B2
公开(公告)日:2025-02-18
申请号:US17826099
申请日:2022-05-26
Applicant: Palantir Technologies Inc.
Inventor: Daniel Deutsch , Kyle Solan , Thomas Mathew , Vasil Vasilev
IPC: G06F16/901 , G06F16/23 , G06F16/27
Abstract: A data processing method comprises creating and storing a dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting a first update to a first dataset; initiating a first build of a first intermediate derived dataset only when a then-current time is within a first period of the one or more periods or a previous build of the first intermediate derived dataset occurred earlier than a then-current time less a second period of the one or more periods; asynchronously detecting a second update to a second dataset; initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset.
-
公开(公告)号:US11379525B1
公开(公告)日:2022-07-05
申请号:US15963038
申请日:2018-04-25
Applicant: PALANTIR TECHNOLOGIES INC.
Inventor: Daniel Deutsch , Kyle Solan , Thomas Mathew , Vasil Vasilev
IPC: G06F16/901 , G06F16/27 , G06F16/23
Abstract: Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises obtaining a definition of at least one derived dataset of a data pipeline, and in response to the obtaining: creating and storing a dependency graph in memory, the dependency graph representing the at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; detecting a first update to a first dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the first update: based on the dependency graph, initiating a first build of a first intermediate derived dataset that depends on the first dataset; initiating a second build that uses the first intermediate derived dataset and that is next in order in the data pipeline according to the dependency graph; asynchronously detecting a second update to a second dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the second update: based on the dependency graph, initiating a third build of a second intermediate derived dataset that depends on the second dataset; wherein the method is performed using one or more processors.
-