-
公开(公告)号:US20160125057A1
公开(公告)日:2016-05-05
申请号:US14738232
申请日:2015-06-12
Applicant: Ab Initio Technology LLC
Inventor: Joel Gould , Scott Studer
IPC: G06F17/30
CPC classification number: G06F17/30539 , G06F17/30427
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for impact analysis. One of the methods includes receiving information about at least two logical datasets, the information identifying, for each logical dataset, a field in that logical dataset and format information about that field. The method includes receiving information about a transformation identifying a first logical dataset from which the transformation is to receive data and a second logical dataset to which the transformed data is provided. The method includes receiving one or more proposed changes to at least one of the fields. The method includes analyzing the proposed changes based on information about the transformation and information about the first logical dataset and the second logical dataset. The method includes calculating metrics of the proposed change based on the analysis. The method also includes storing information about the metrics.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序用于影响分析。 其中一种方法包括接收关于至少两个逻辑数据集的信息,该信息针对每个逻辑数据集标识该逻辑数据集中的一个字段并且格式化关于该字段的信息。 该方法包括接收关于标识从其转换接收数据的第一逻辑数据集的变换的信息和提供变换数据的第二逻辑数据集。 该方法包括接收至少一个字段的一个或多个所提出的改变。 该方法包括基于关于第一逻辑数据集和第二逻辑数据集的变换和信息的信息来分析所提出的改变。 该方法包括基于分析计算所提出的变化的度量。 该方法还包括存储有关度量的信息。
-
公开(公告)号:US20160062799A1
公开(公告)日:2016-03-03
申请号:US14843001
申请日:2015-09-02
Applicant: Ab Initio Technology LLC
Inventor: Craig W. Stanfill , Richard Shapiro , Stephen A. Kukolich , Joseph Skeffington Wholey, III
CPC classification number: G06F9/44 , G06F8/34 , G06F9/448 , G06F9/4482 , G06F9/4494 , G06F9/46 , G06F9/465 , G06F9/466 , G06F9/4843 , G06F9/4881
Abstract: A graph-based program specification includes components, at least one having at least one input port for receiving a collection of data elements, or at least one collection type output port for providing a collection of data elements. Executing a program specified by the graph-based program specification at a computing node, includes: receiving data elements of a first collection into a first storage in a first order via a link connected to a collection type output port of a first component and an input port of a second component, and invoking a plurality of instances of a task corresponding to the second component to process data elements of the first collection, including retrieving the data elements from the first storage in a second order, without blocking invocation of any of the instances until after any particular instance completes processing one or more data elements.
Abstract translation: 基于图形的程序规范包括组件,至少一个具有至少一个用于接收数据元素的集合的输入端口,或至少一个用于提供数据元素集合的集合类型输出端口。 在计算节点处执行由基于图形的程序规范指定的程序包括:经由连接到第一组件的集合类型输出端口的链接和第一组件的输入以第一顺序将第一集合的数据元素接收到第一存储器中 端口,并且调用与第二组件相对应的任务的多个实例以处理第一集合的数据元素,包括以第二顺序从第一存储器检索数据元素,而不阻止任何 直到任何特定实例完成处理一个或多个数据元素为止。
-
公开(公告)号:US20150347193A1
公开(公告)日:2015-12-03
申请号:US14470501
申请日:2014-08-27
Applicant: AB INITIO TECHNOLOGY LLC
Inventor: Harry Michael Wolfson , Joel Gould , Anthony Yeracaris , Tim Wakeling
IPC: G06F9/50
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for workload automation and job scheduling information. One of the methods includes obtaining job dependency information, the job dependency information specifying an order of execution of a plurality of jobs. The method also includes obtaining data lineage information that identifies dependency relationships between data stores and transformation, wherein at least one transformation accepts data from a first data store and produces data for a second data store. The method also includes creating links between the job dependency information and the data lineage information. The method also includes determining an impact of a change in a planned execution of an application of the plurality of applications based on the job dependency information, the created links, and the data lineage information.
Abstract translation: 方法,系统和装置,包括在计算机存储介质上编码的计算机程序,用于工作负载自动化和作业调度信息。 其中一种方法包括获得作业依赖性信息,该作业依赖性信息指定多个作业的执行顺序。 该方法还包括获得识别数据存储和变换之间的依赖关系的数据谱系信息,其中至少一个变换接收来自第一数据存储的数据并产生第二数据存储的数据。 该方法还包括创建作业依赖性信息和数据谱系信息之间的链接。 该方法还包括基于作业依赖性信息,所创建的链接和数据谱系信息来确定多个应用程序的应用程序的计划执行中的改变的影响。
-
公开(公告)号:US09143624B2
公开(公告)日:2015-09-22
申请号:US13837860
申请日:2013-03-15
Applicant: Ab Initio Technology LLC
Inventor: Larry Paul Rossi
CPC classification number: H04M15/41 , G06F17/30289 , H04M15/43 , H04M15/725 , H04M15/73
Abstract: A method includes determining a first quantity of data records of a group of data records from a stream of data records received by an application having a plurality of modules. The method includes, for one or more of the modules of the application, determining a respective second quantity of data records output by the module during processing of the group of data records. The method includes determining whether the first and second quantities of data records satisfy a rule. The rule is indicative of a target relationship among a quantity of data records received by the application and a quantity of data records output by one or more modules of the application.
Abstract translation: 一种方法包括从具有多个模块的应用程序接收的数据记录流中确定一组数据记录的第一数据量记录。 该方法包括对于应用程序的一个或多个模块,在处理该组数据记录期间确定模块输出的相应的第二数量数据记录数量。 该方法包括确定第一和第二数量的数据记录是否满足规则。 该规则表示应用程序接收的数据记录数量与应用程序的一个或多个模块输出的数据记录数量之间的目标关系。
-
公开(公告)号:US20150261694A1
公开(公告)日:2015-09-17
申请号:US14658440
申请日:2015-03-16
Applicant: Ab Initio Technology LLC
Inventor: Jed Roberts , Craig W. Stanfill , Scott Studer
CPC classification number: G06F13/10 , G06F17/30477 , G06F17/30569 , G06F17/30589 , G06F17/30604 , G06F17/30917 , G06Q10/067
Abstract: One or more mappings each define a correspondence between input attributes of an input entity and output attributes of an output entity, where the input out output entities each include one or more key attributes identified as part of a unique key. Computing result information, displayed in a user interface, includes: processing instances of a first input entity to generate instances of a first output entity; determining one or more mapped input attributes of the first input entity that correspond to each of the key attributes of the first output entity; generating the instances of the first output entity based on the determined one or more mapped input attributes; computing a total number of instances of the first input entity that were processed; and computing a total number of instances of the first output entity that were generated.
Abstract translation: 一个或多个映射各自定义输入实体的输入属性与输出实体的输出属性之间的对应关系,其中输出输出实体各自包括被识别为唯一密钥的一部分的一个或多个关键属性。 显示在用户界面中的计算结果信息包括:处理第一输入实体的实例以生成第一输出实体的实例; 确定与所述第一输出实体的每个所述关键属性相对应的所述第一输入实体的一个或多个映射输入属性; 基于所确定的一个或多个映射的输入属性生成所述第一输出实体的实例; 计算处理的第一个输入实体的实例总数; 并计算生成的第一个输出实体的总数。
-
公开(公告)号:US20150169428A1
公开(公告)日:2015-06-18
申请号:US14573038
申请日:2014-12-17
Applicant: Ab Initio Technology LLC
Inventor: Marshall A. Isman , Richard Alan Epstein
IPC: G06F11/36
CPC classification number: G06F11/36 , G06F11/3688
Abstract: A method includes receiving data indicative of a number of times each of one or more rules was executed by a data processing application during processing of one or more records; based on the number of times each of the rules was executed by the data processing application, determining a content criterion for each of one or more particular fields; generating content for each of the particular fields based on the content criterion; and populating each of the particular fields with the generated content.
Abstract translation: 一种方法包括在处理一个或多个记录期间接收指示数据处理应用程序执行一个或多个规则的每一个的次数的数据; 基于每个规则由数据处理应用执行的次数,确定一个或多个特定字段中的每一个的内容标准; 基于内容标准为每个特定字段生成内容; 并用生成的内容填充每个特定字段。
-
77.
公开(公告)号:US20150149503A1
公开(公告)日:2015-05-28
申请号:US14090434
申请日:2013-11-26
Applicant: Ab Initio Technology LLC
Inventor: Ann M. Wollrath , Bryan Phil Douros , Marshall Alan Isman , Timothy Wakeling
IPC: G06F17/30
Abstract: An approach to parallel access of data from a distributed filesystem provides parallel access to one or more named units (e.g., files) in the filesystem by creating multiple parallel data streams such that all the data of the desired units is partitioned over the multiple streams. In some examples, the multiple streams form multiple inputs to a parallel implementation of a computation system, such as a graph-based computation system, dataflow-based system, and/or a (e.g., relational) database system.
Abstract translation: 从分布式文件系统并行访问数据的方法通过创建多个并行数据流来提供对文件系统中的一个或多个命名单元(例如,文件)的并行访问,使得所需单元的所有数据在多个流上被分区。 在一些示例中,多个流形成计算系统的并行实现的多个输入,诸如基于图的计算系统,基于数据流的系统和/或(例如,关系)数据库系统。
-
公开(公告)号:US20150106341A1
公开(公告)日:2015-04-16
申请号:US14519030
申请日:2014-10-20
Applicant: Ab Initio Technology LLC
Inventor: Joel Gould , Carl Richard Feynman , Paul Bay
IPC: G06F17/30
CPC classification number: G06F17/30371 , G06F17/30466 , G06F17/30486 , G06F17/30489 , G06F17/30539 , G06F17/3056 , G06F17/30569 , G06F17/30598
Abstract: Processing data includes profiling data from a data source, including reading the data from the data source, computing summary data characterizing the data while reading the data, and storing profile information that is based on the summary data. The data is then processed from the data source. This processing includes accessing the stored profile information and processing the data according to the accessed profile information.
Abstract translation: 处理数据包括从数据源分析数据,包括从数据源读取数据,在读取数据时计算表征数据的汇总数据,以及存储基于摘要数据的简档信息。 然后从数据源处理数据。 该处理包括访问所存储的简档信息并根据所访问的简档信息处理数据。
-
79.
公开(公告)号:US20140344508A1
公开(公告)日:2014-11-20
申请号:US14279615
申请日:2014-05-16
Applicant: Ab Initio Technology LLC
Inventor: Muhammad Arshad Khan , Stephen G. Rybicki , Joel Gould
IPC: G06F12/02
CPC classification number: G06F3/0644 , G06F3/0604 , G06F3/0679 , G06F9/5016 , G06F9/5022 , G06F12/0246 , G06F2212/7204
Abstract: Processing a plurality of data units to generate result information, includes: performing a data operation for each data unit of a first subset of data units from the plurality of data units, and storing information associated with a result of the data operation in a first set of one or more data structures stored in working memory space of a memory device; after an overflow condition on the working memory space is satisfied, storing information in overflow storage space of a storage device; and repeating an overflow processing procedure multiple times during the processing of the plurality of data units, the overflow processing procedure including: updating a new set of one or more data structures stored in the working memory space using at least some information stored in the overflow storage space.
Abstract translation: 处理多个数据单元以生成结果信息包括:对来自多个数据单元的数据单元的第一子集的每个数据单元执行数据操作,并将与数据操作的结果相关联的信息存储在第一组中 存储在存储器件的工作存储器空间中的一个或多个数据结构; 在工作存储器空间中的溢出状态满足之后,将信息存储在存储装置的溢出存储空间中; 并且在多个数据单元的处理期间多次重复溢出处理过程,溢出处理过程包括:使用存储在溢出存储器中的至少一些信息来更新存储在工作存储器空间中的一组或多个数据结构的新集合 空间。
-
公开(公告)号:US20140317632A1
公开(公告)日:2014-10-23
申请号:US14259479
申请日:2014-04-23
Applicant: Ab Initio Technology LLC
Inventor: Craig W. Stanfill
IPC: G06F9/48
Abstract: A graph-based program specification specifies at least a partial ordering among a plurality of tasks represented by its nodes. Executing a specified program includes: executing a first subroutine corresponding to a first task, including a first task section for performing the first task; storing state information indicating a state of the first task selected from a set of possible states that includes: a pending state in which the first task section is waiting to perform the first task, and a suppressed state in which the first task section has been prevented from performing the first task; and executing a second subroutine corresponding to a second task, including a second task section for performing the second task, and a control section that controls execution of the second task section based at least in part on the state of the first task indicated by the stored state information.
Abstract translation: 基于图形的程序规范指定由其节点表示的多个任务中的至少部分排序。 执行指定的程序包括:执行对应于第一任务的第一子例程,包括用于执行第一任务的第一任务部分; 存储指示从一组可能状态中选择的第一任务的状态的状态信息,所述状态包括:第一任务部正在等待执行第一任务的待决状态以及第一任务部分已被阻止的抑制状态 从执行第一任务; 以及执行对应于第二任务的第二子例程,所述第二任务包括用于执行所述第二任务的第二任务部分,以及至少部分地基于所存储的所述第二任务指示的所述第一任务的状态来控制所述第二任务部分的执行的控制部分 状态信息。
-
-
-
-
-
-
-
-
-