-
1.
公开(公告)号:US12242443B2
公开(公告)日:2025-03-04
申请号:US18399545
申请日:2023-12-28
Applicant: Ab Initio Technology LLC
Inventor: John Joyce , Marshall A. Isman , Sandrick Melbouci
Abstract: Methods and systems are configured to determine a semantic meaning for data and generate data processing rules based on the semantic meaning of the data. The semantic meaning includes syntactical or contextual meaning for the data that is determined, for example, by profiling, by the data processing system, values stored in a field included in data records of one or more datasets; applying, by the data processing system, one or more classifiers to the profiled values; identifying, based on applying the one or more classifiers, one or more attributes indicative of a logical or syntactical characteristic for the values of the field, with each of the one or more attributes having a respective confidence level that is based on an output of each of the one or more classifiers. The attributes are associated with the fields and are used for generating data processing rules and processing the data.
-
公开(公告)号:US20240126748A1
公开(公告)日:2024-04-18
申请号:US18345852
申请日:2023-06-30
Applicant: Ab Initio Technology LLC
Inventor: Jonah Egenolf , Marshall A. Isman , Ian Schechter
IPC: G06F16/242 , G06F8/34 , G06F8/36 , G06F8/38 , G06F16/21 , G06F16/23 , G06F16/2452 , G06F16/2453 , G06F16/28
CPC classification number: G06F16/2423 , G06F8/34 , G06F8/36 , G06F8/38 , G06F16/211 , G06F16/2365 , G06F16/24524 , G06F16/24526 , G06F16/2453 , G06F16/24544 , G06F16/24545 , G06F16/288 , G06Q10/10
Abstract: A method includes accessing a schema that specifies relationships among datasets, computations on the datasets, or transformations of the datasets, selecting a dataset from among the datasets, and identifying, from the schema, other datasets that are related to the selected dataset. Attributes of the datasets are identified, and logical data representing the identified attributes and relationships among the attributes is generated. The logical data is provided to a development environment, which provides access to portions of the logical data representing the identified attributes. A specification that specifies at least one of the identified attributes in performing an operation is received from the development environment. Based on the specification and the relationships among the identified attributes represented by the logical data, a computer program is generated to perform the operation by accessing, from storage, at least one dataset having the at least one of the attributes specified in the specification.
-
公开(公告)号:US20240104113A1
公开(公告)日:2024-03-28
申请号:US18492425
申请日:2023-10-23
Applicant: Ab Initio Technology LLC
Inventor: Andrew Blom , Darren Miller , Marshall A. Isman
IPC: G06F16/25 , G06F8/34 , G06F16/901 , H04L67/565
CPC classification number: G06F16/254 , G06F8/34 , G06F16/258 , G06F16/9024 , H04L67/565
Abstract: A method for generating an executable application to transform and load data into a structured dataset includes receiving a metadata file that specifies values for parameters for structuring data feeds, received from a networked data source, into a structured database. The metadata file specifies logical rules for transforming the data feeds. The values of the parameters and the logical rules for transforming the plurality of the data feeds are validated to ensure logical consistency for each data feed. Data rules are generated that specify standards for transforming each data feed in accordance with the validated values of the parameters and logical rules. The executable application is generated that is configured to receive source data comprising a data feed from one or more data sources and transform the source data into structured data that satisfies the one or more standards for the structured data record in compliance with the data rules.
-
4.
公开(公告)号:US11886399B2
公开(公告)日:2024-01-30
申请号:US17006504
申请日:2020-08-28
Applicant: Ab Initio Technology LLC
Inventor: John Joyce , Marshall A. Isman , Sandrick Melbouci
CPC classification number: G06F16/215 , G06F16/2228 , G06F16/285 , G06N5/04 , G06N20/00
Abstract: Methods and systems are configured to determine a semantic meaning for data and generate data processing rules based on the semantic meaning of the data. The semantic meaning includes syntactical or contextual meaning for the data that is determined, for example, by profiling, by the data processing system, values stored in a field included in data records of one or more datasets; applying, by the data processing system, one or more classifiers to the profiled values; identifying, based on applying the one or more classifiers, one or more attributes indicative of a logical or syntactical characteristic for the values of the field, with each of the one or more attributes having a respective confidence level that is based on an output of each of the one or more classifiers. The attributes are associated with the fields and are used for generating data processing rules and processing the data.
-
公开(公告)号:US11835994B2
公开(公告)日:2023-12-05
申请号:US16517320
申请日:2019-07-19
Applicant: Ab Initio Technology LLC
Inventor: Andrew Blom , Darren Miller , Marshall A. Isman
IPC: G06F7/00 , G06F17/00 , G06F16/25 , G06F16/901 , G06F8/34 , H04L67/565
CPC classification number: G06F16/254 , G06F8/34 , G06F16/258 , G06F16/9024 , H04L67/565
Abstract: A method for generating an executable application to transform and load data into a structured dataset includes receiving a metadata file that specifies values for parameters for structuring data feeds, received from a networked data source, into a structured database. The metadata file specifies logical rules for transforming the data feeds. The values of the parameters and the logical rules for transforming the plurality of the data feeds are validated to ensure logical consistency for each data feed. Data rules are generated that specify standards for transforming each data feed in accordance with the validated values of the parameters and logical rules. The executable application is generated that is configured to receive source data comprising a data feed from one or more data sources and transform the source data into structured data that satisfies the one or more standards for the structured data record in compliance with the data rules.
-
公开(公告)号:US11487534B2
公开(公告)日:2022-11-01
申请号:US17306075
申请日:2021-05-03
Applicant: Ab Initio Technology LLC
Inventor: John Joyce , Marshall A. Isman , Sam Kendall
Abstract: A method for analyzing a computer program ecosystem includes performing a static analysis, including identifying static dependencies among elements of the ecosystem based on values of parameters in one or more parameter sets associated with the ecosystem, the elements of the ecosystem including the computer programs of the ecosystem and data resources associated with the computer programs. The method includes performing a runtime analysis, including identifying elements of the ecosystem that were utilized during execution of the ecosystem to process data records. The method includes performing a schedule analysis, including identifying a computer program of the ecosystem that has a schedule dependency from another computer program of the ecosystem. The method includes identifying a subset of the elements of the ecosystem as an ecosystem unit based on the results of the static, runtime, and schedule analyses. The method includes migrating the ecosystem unit, testing the ecosystem unit, or both.
-
公开(公告)号:US11423083B2
公开(公告)日:2022-08-23
申请号:US15795917
申请日:2017-10-27
Applicant: Ab Initio Technology LLC
Inventor: Jonah Egenolf , Marshall A. Isman , Frederic Wild
IPC: G06F16/901 , G06F16/26 , G06F8/34 , G06F8/10 , G06F16/25
Abstract: A method performed by a computer system including: accessing a specification that specifies a plurality of modules to be implemented by the computer program for processing the one or more values of the one or more fields in the structured data item; transforming the specification into the computer program that implements the plurality of modules, wherein the transforming includes: for each of one or more first modules of the plurality of modules: identifying one or more second modules of the plurality of modules that each receive input that is at least partly based on an output of the first module; and formatting an output data format of the first module such that the first module outputs only one or more values of one or more fields of the structured data item.
-
公开(公告)号:US20210263734A1
公开(公告)日:2021-08-26
申请号:US17306075
申请日:2021-05-03
Applicant: Ab Initio Technology LLC
Inventor: John Joyce , Marshall A. Isman , Sam Kendall
Abstract: A method for analyzing a computer program ecosystem including multiple computer programs includes performing a static analysis of the ecosystem, including identifying static dependencies among elements of the ecosystem based on values of parameters in one or more parameter sets associated with the ecosystem, the elements of the ecosystem including the computer programs of the ecosystem and data resources associated with the computer programs. The method includes performing a runtime analysis of the ecosystem, including identifying elements of the ecosystem that were utilized during execution of the ecosystem to process data records. The method includes performing a schedule analysis of the ecosystem, including identifying a computer program of the ecosystem that has a schedule dependency from another computer program of the ecosystem. The method includes identifying a subset of the elements of the ecosystem as an ecosystem unit based on the results of the static, runtime, and schedule analyses. The method includes migrating the ecosystem unit from a first computer system to a second computer system, testing the ecosystem unit, or both.
-
公开(公告)号:US20150169428A1
公开(公告)日:2015-06-18
申请号:US14573038
申请日:2014-12-17
Applicant: Ab Initio Technology LLC
Inventor: Marshall A. Isman , Richard Alan Epstein
IPC: G06F11/36
CPC classification number: G06F11/36 , G06F11/3688
Abstract: A method includes receiving data indicative of a number of times each of one or more rules was executed by a data processing application during processing of one or more records; based on the number of times each of the rules was executed by the data processing application, determining a content criterion for each of one or more particular fields; generating content for each of the particular fields based on the content criterion; and populating each of the particular fields with the generated content.
Abstract translation: 一种方法包括在处理一个或多个记录期间接收指示数据处理应用程序执行一个或多个规则的每一个的次数的数据; 基于每个规则由数据处理应用执行的次数,确定一个或多个特定字段中的每一个的内容标准; 基于内容标准为每个特定字段生成内容; 并用生成的内容填充每个特定字段。
-
公开(公告)号:US20140222752A1
公开(公告)日:2014-08-07
申请号:US13827558
申请日:2013-03-14
Applicant: AB INITIO TECHNOLOGY LLC
Inventor: Marshall A. Isman , Richard A. Epstein , Ralf Haug , Andrew F. Roberts , John Ralston , John L. Richardson , Justin Pniower
IPC: G06F17/30
CPC classification number: G06F11/3684 , G06F17/30306 , G06F17/30867
Abstract: A computer-implemented method includes accessing a plurality of data records, each data record having a plurality of data fields. The method further includes analyzing values for one or more of the data fields for at least some of the plurality of data records and generating a profile of the plurality of data records based on the analyzing. The method further includes formulating at least one subsetting rule based on the profile; and selecting a subset of data records from the plurality of data records based on the at least one subsetting rule.
Abstract translation: 计算机实现的方法包括访问多个数据记录,每个数据记录具有多个数据字段。 该方法还包括分析多个数据记录中的至少一些数据记录中的一个或多个数据字段的值,并且基于分析生成多个数据记录的简档。 该方法还包括基于该简档来制定至少一个子集规则; 以及基于所述至少一个子集规则从所述多个数据记录中选择数据记录的子集。
-
-
-
-
-
-
-
-
-