Continuous builds of derived datasets in response to other dataset updates

    公开(公告)号:US12229189B2

    公开(公告)日:2025-02-18

    申请号:US17826099

    申请日:2022-05-26

    Abstract: A data processing method comprises creating and storing a dependency graph representing at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; reading configuration data specifying one or more periods for one or more datasets in the dependency graph; detecting a first update to a first dataset; initiating a first build of a first intermediate derived dataset only when a then-current time is within a first period of the one or more periods or a previous build of the first intermediate derived dataset occurred earlier than a then-current time less a second period of the one or more periods; asynchronously detecting a second update to a second dataset; initiating, in response to the second update, a second build of a second intermediate derived dataset that depends on the second dataset.

    Continuous builds of derived datasets in response to other dataset updates

    公开(公告)号:US11379525B1

    公开(公告)日:2022-07-05

    申请号:US15963038

    申请日:2018-04-25

    Abstract: Techniques for automatically scheduling builds of derived datasets in a distributed database system that supports pipelined data transformations are described herein. In an embodiment, a data processing method comprises obtaining a definition of at least one derived dataset of a data pipeline, and in response to the obtaining: creating and storing a dependency graph in memory, the dependency graph representing the at least one derived dataset and one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends; detecting a first update to a first dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the first update: based on the dependency graph, initiating a first build of a first intermediate derived dataset that depends on the first dataset; initiating a second build that uses the first intermediate derived dataset and that is next in order in the data pipeline according to the dependency graph; asynchronously detecting a second update to a second dataset from among the one or more raw datasets or intermediate derived datasets on which the at least one derived dataset depends, and in response to the second update: based on the dependency graph, initiating a third build of a second intermediate derived dataset that depends on the second dataset; wherein the method is performed using one or more processors.

Patent Agency Ranking