-
公开(公告)号:US20240070163A1
公开(公告)日:2024-02-29
申请号:US18104066
申请日:2023-01-31
Applicant: Ab Initio Technology LLC
Inventor: Robert Parks , Jonah Egenolf
IPC: G06F16/25 , G06F16/26 , G06F16/901
CPC classification number: G06F16/254 , G06F16/26 , G06F16/9024
Abstract: A method for using a metadata model to perform operations on data items, with the metadata model including parent nodes and child nodes connected by edges, with the parent nodes specifying logical metadata and the child nodes specifying physical metadata representing the data items, and with the edges specifying relationships between the nodes. The method includes: identifying a given data item and physical metadata of that given data item, accessing the metadata model, identifying, in the metadata model, a child node representing the physical metadata of the given data item, traversing one or more edges in the metadata model to identify parent nodes of the child node, determining, from logical metadata associated with the identified parent nodes, one or more operations to be performed on the given data item, applying the one or more operations to the given data item to transform the data item, and storing the transformed data item.
-
12.
公开(公告)号:US11886399B2
公开(公告)日:2024-01-30
申请号:US17006504
申请日:2020-08-28
Applicant: Ab Initio Technology LLC
Inventor: John Joyce , Marshall A. Isman , Sandrick Melbouci
CPC classification number: G06F16/215 , G06F16/2228 , G06F16/285 , G06N5/04 , G06N20/00
Abstract: Methods and systems are configured to determine a semantic meaning for data and generate data processing rules based on the semantic meaning of the data. The semantic meaning includes syntactical or contextual meaning for the data that is determined, for example, by profiling, by the data processing system, values stored in a field included in data records of one or more datasets; applying, by the data processing system, one or more classifiers to the profiled values; identifying, based on applying the one or more classifiers, one or more attributes indicative of a logical or syntactical characteristic for the values of the field, with each of the one or more attributes having a respective confidence level that is based on an output of each of the one or more classifiers. The attributes are associated with the fields and are used for generating data processing rules and processing the data.
-
公开(公告)号:US20230409835A1
公开(公告)日:2023-12-21
申请号:US18201545
申请日:2023-05-24
Applicant: Ab Initio Technology LLC
Inventor: Christopher Thurston Butler , Timothy Spencer Bush
IPC: G06F40/30 , G06F16/93 , G06N20/00 , G06F16/908
CPC classification number: G06F40/30 , G06F16/908 , G06N20/00 , G06F16/93
Abstract: A data processing system for discovering a semantic meaning of a field included in one or more data sets is configured to identify a field included in one or more data sets, with the field having an identifier. For that field, the system profiles data values of the field to generate a data profile, accesses a plurality of label proposal tests, and generates a set of label proposals by applying the plurality of label proposal tests to the data profile. The system determines a similarity among the label proposals and selects a classification. The system identifies one of the label proposals as identifying the semantic meaning. The system stores the identifier of the field with the identified one of the label proposals that identifies the semantic meaning.
-
公开(公告)号:US11835994B2
公开(公告)日:2023-12-05
申请号:US16517320
申请日:2019-07-19
Applicant: Ab Initio Technology LLC
Inventor: Andrew Blom , Darren Miller , Marshall A. Isman
IPC: G06F7/00 , G06F17/00 , G06F16/25 , G06F16/901 , G06F8/34 , H04L67/565
CPC classification number: G06F16/254 , G06F8/34 , G06F16/258 , G06F16/9024 , H04L67/565
Abstract: A method for generating an executable application to transform and load data into a structured dataset includes receiving a metadata file that specifies values for parameters for structuring data feeds, received from a networked data source, into a structured database. The metadata file specifies logical rules for transforming the data feeds. The values of the parameters and the logical rules for transforming the plurality of the data feeds are validated to ensure logical consistency for each data feed. Data rules are generated that specify standards for transforming each data feed in accordance with the validated values of the parameters and logical rules. The executable application is generated that is configured to receive source data comprising a data feed from one or more data sources and transform the source data into structured data that satisfies the one or more standards for the structured data record in compliance with the data rules.
-
公开(公告)号:US20230359668A1
公开(公告)日:2023-11-09
申请号:US18114212
申请日:2023-02-24
Applicant: Ab Initio Technology LLC
Inventor: Ian Robert Schechter , Garth Allen Dickie , Jonah Egenolf , Marshall Isman
IPC: G06F16/901
CPC classification number: G06F16/9024
Abstract: Described herein are techniques, performed by a data processing system, for enabling efficient development of software application programs in a dynamic environment with multiple datasets by generating entries in a dataset catalog to provide a software application program with access to output data dynamically generated by dataflow graphs, the entries associated with respective software application programs developed as dataflow graphs. The techniques include identifying a subgraph, wherein, when the subgraph is executed, the subgraph generates output data by applying one or more data processing operations to data obtained from one or more data sources; creating, in the dataset catalog, a new entry associated with the identified subgraph, the new entry associated with information indicating nodes, links, and configuration parameters of the identified subgraph; and configuring the dataset catalog to enable access to the new entry, in the dataset catalog, associated with the identified subgraph.
-
公开(公告)号:US11782820B2
公开(公告)日:2023-10-10
申请号:US17029828
申请日:2020-09-23
Applicant: Ab Initio Technology LLC
Inventor: Joyce L. Vigneau , Mark Staknis , Xin Li
CPC classification number: G06F11/3664 , G06F11/323 , G06F11/362 , G06F11/3636 , G06F11/3656 , G06F11/3696
Abstract: A computer-implemented method for debugging an executable control flow graph that specifies control flow among a plurality of functional modules, with the control flow being represented as transitions among the plurality of functional modules, the computer-implemented method including: specifying a position in the executable control flow graph at which execution of the executable control flow graph is to be interrupted; wherein the specified position represents a transition to a given functional module, a transition to a state in which contents of the given functional module are executed or a transition from the given functional module; starting execution of the executable control flow graph in an execution environment; and at a point of execution representing the specified position, interrupting execution of the executable control flow graph; and providing data representing one or more attributes of the execution environment in which the given functional module is being executed.
-
公开(公告)号:US11748165B2
公开(公告)日:2023-09-05
申请号:US16906193
申请日:2020-06-19
Applicant: Ab Initio Technology LLC
Inventor: Harry Michael Wolfson , Joel Gould , Anthony Yeracaris , Tim Wakeling
IPC: G06F9/50
CPC classification number: G06F9/5038
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.
-
公开(公告)号:US11741091B2
公开(公告)日:2023-08-29
申请号:US15829152
申请日:2017-12-01
Applicant: Ab Initio Technology LLC
Inventor: David Clemens , Dusan Radivojevic , Neil Galarneau
IPC: G06F16/245 , G06F16/22 , G06F16/248 , G06F16/83 , G06F40/117
CPC classification number: G06F16/245 , G06F16/22 , G06F16/248 , G06F16/83 , G06F40/117
Abstract: Among other things, we describe a method of receiving a portion of metadata from a data source, the portion of metadata describing nodes and edges; generating instances of a data structure representing the portion of metadata, at least one instance of the data structure including an identification value that identifies a corresponding node, one or more property values representing respective properties of the corresponding node, and one or more pointers to respective identification values, each pointer representing an edge associated with a node identified by the corresponding respective identification value; storing the instances of the data structure in random access memory; receiving a query that includes an identification of at least one particular element of data; and using at least one instance of the data structure to cause a display of a computer system to display a representation of lineage of the particular element of data.
-
公开(公告)号:US11720583B2
公开(公告)日:2023-08-08
申请号:US17878106
申请日:2022-08-01
Applicant: Ab Initio Technology LLC
Inventor: Ian Schechter , Tim Wakeling , Ann M. Wollrath
IPC: G06F16/24 , G06F16/2458 , G06F16/13 , G06F16/25 , G06F16/28 , G06F16/17 , G06F16/901 , G06F9/50
CPC classification number: G06F16/2471 , G06F9/5066 , G06F16/13 , G06F16/1734 , G06F16/254 , G06F16/285 , G06F16/9024 , G06F16/284
Abstract: In a first aspect, a method includes, at a node of a Hadoop cluster, the node storing a first portion of data in HDFS data storage, executing a first instance of a data processing engine capable of receiving data from a data source external to the Hadoop cluster, receiving a computer-executable program by the data processing engine, executing at least part of the program by the first instance of the data processing engine, receiving, by the data processing engine, a second portion of data from the external data source, storing the second portion of data other than in HDFS storage, and performing, by the data processing engine, a data processing operation identified by the program using at least the first portion of data and the second portion of data.
-
20.
公开(公告)号:US11669343B2
公开(公告)日:2023-06-06
申请号:US17477922
申请日:2021-09-17
Applicant: Ab Initio Technology LLC
Inventor: Oded Ravid , Trevor Murphy
IPC: G06F7/00 , G06F9/448 , G06F16/901 , G06F16/2455 , G06F16/178 , G06F9/445 , G06F8/41
CPC classification number: G06F9/4494 , G06F9/44505 , G06F16/1794 , G06F16/24568 , G06F16/9024 , G06F8/433
Abstract: A method is described for processing keyed data items that are each associated with a value of a key, the keyed data items being from a plurality of distinct data streams, the processing including collecting the keyed data items, determining, based on contents of at least one of the keyed data items, satisfaction of one or more specified conditions for execution of one or more actions and causing execution of at least one of the one or more actions responsive to the determining.
-
-
-
-
-
-
-
-
-