Publishing to a data warehouse
    93.
    发明授权

    公开(公告)号:US11835994B2

    公开(公告)日:2023-12-05

    申请号:US16517320

    申请日:2019-07-19

    摘要: A method for generating an executable application to transform and load data into a structured dataset includes receiving a metadata file that specifies values for parameters for structuring data feeds, received from a networked data source, into a structured database. The metadata file specifies logical rules for transforming the data feeds. The values of the parameters and the logical rules for transforming the plurality of the data feeds are validated to ensure logical consistency for each data feed. Data rules are generated that specify standards for transforming each data feed in accordance with the validated values of the parameters and logical rules. The executable application is generated that is configured to receive source data comprising a data feed from one or more data sources and transform the source data into structured data that satisfies the one or more standards for the structured data record in compliance with the data rules.

    DATAFLOW GRAPH DATASETS
    94.
    发明公开

    公开(公告)号:US20230359668A1

    公开(公告)日:2023-11-09

    申请号:US18114212

    申请日:2023-02-24

    IPC分类号: G06F16/901

    CPC分类号: G06F16/9024

    摘要: Described herein are techniques, performed by a data processing system, for enabling efficient development of software application programs in a dynamic environment with multiple datasets by generating entries in a dataset catalog to provide a software application program with access to output data dynamically generated by dataflow graphs, the entries associated with respective software application programs developed as dataflow graphs. The techniques include identifying a subgraph, wherein, when the subgraph is executed, the subgraph generates output data by applying one or more data processing operations to data obtained from one or more data sources; creating, in the dataset catalog, a new entry associated with the identified subgraph, the new entry associated with information indicating nodes, links, and configuration parameters of the identified subgraph; and configuring the dataset catalog to enable access to the new entry, in the dataset catalog, associated with the identified subgraph.

    Debugging an executable control flow graph that specifies control flow

    公开(公告)号:US11782820B2

    公开(公告)日:2023-10-10

    申请号:US17029828

    申请日:2020-09-23

    IPC分类号: G06F11/36 G06F11/32

    摘要: A computer-implemented method for debugging an executable control flow graph that specifies control flow among a plurality of functional modules, with the control flow being represented as transitions among the plurality of functional modules, the computer-implemented method including: specifying a position in the executable control flow graph at which execution of the executable control flow graph is to be interrupted; wherein the specified position represents a transition to a given functional module, a transition to a state in which contents of the given functional module are executed or a transition from the given functional module; starting execution of the executable control flow graph in an execution environment; and at a point of execution representing the specified position, interrupting execution of the executable control flow graph; and providing data representing one or more attributes of the execution environment in which the given functional module is being executed.

    Workload automation and data lineage analysis

    公开(公告)号:US11748165B2

    公开(公告)日:2023-09-05

    申请号:US16906193

    申请日:2020-06-19

    IPC分类号: G06F9/50

    CPC分类号: G06F9/5038

    摘要: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.

    Generating, accessing, and displaying lineage metadata

    公开(公告)号:US11741091B2

    公开(公告)日:2023-08-29

    申请号:US15829152

    申请日:2017-12-01

    摘要: Among other things, we describe a method of receiving a portion of metadata from a data source, the portion of metadata describing nodes and edges; generating instances of a data structure representing the portion of metadata, at least one instance of the data structure including an identification value that identifies a corresponding node, one or more property values representing respective properties of the corresponding node, and one or more pointers to respective identification values, each pointer representing an edge associated with a node identified by the corresponding respective identification value; storing the instances of the data structure in random access memory; receiving a query that includes an identification of at least one particular element of data; and using at least one instance of the data structure to cause a display of a computer system to display a representation of lineage of the particular element of data.

    METADATA-DRIVEN DATA INGESTION
    100.
    发明申请

    公开(公告)号:US20230100418A1

    公开(公告)日:2023-03-30

    申请号:US17665109

    申请日:2022-02-04

    IPC分类号: G06F3/06

    摘要: An electronic system for increasing the speed of preparing data with a specified data quality for storage by automatically identifying for a user, with minimal user input, common contexts among (i) fields in disparate datasets, and (ii) names the user has specified as potentially describing the fields, and by using those common contexts to govern the disparate datasets prior to storage to ensure the specified data quality.