Evaluating dataflow graph characteristics

    公开(公告)号:US09727438B2

    公开(公告)日:2017-08-08

    申请号:US13217778

    申请日:2011-08-25

    IPC分类号: G06F11/34 G06F9/50

    CPC分类号: G06F11/3476 G06F9/50

    摘要: One or more expressions are evaluated that represent one or more characteristics of a dataflow graph that includes vertices representing data processing components connected by links representing flows of work elements between the components. A request is received by a computing system to evaluate the one or more expressions that include one or more operations on one or more variables; and the one or more expressions are evaluated by the computing system. The evaluating includes: defining a data structure that includes one or more fields, collecting, during execution of the dataflow graph, tracking information associated with one or more components of the dataflow graph, storing values associated with the tracking information in the one or more fields, and replacing one or more variables of the one or more expressions with the values stored in the one or more fields to compute a result of evaluating the one or more expressions.

    MANAGING TASK EXECUTION
    3.
    发明申请
    MANAGING TASK EXECUTION 有权
    管理任务执行

    公开(公告)号:US20100211953A1

    公开(公告)日:2010-08-19

    申请号:US12704998

    申请日:2010-02-12

    IPC分类号: G06F9/46 G06F9/54 G06F11/00

    CPC分类号: G06F9/5038 G06F2209/506

    摘要: Managing task execution includes: receiving a specification of a plurality of tasks to be performed by respective functional modules; processing a flow of input data using a dataflow graph that includes nodes representing data processing components connected by links representing flows of data between data processing components; in response to at least one flow of data provided by at least one data processing component, generating a flow of messages; and in response to each of the messages in the flow of messages, performing an iteration of a set of one or more tasks using one or more corresponding functional modules.

    摘要翻译: 管理任务执行包括:接收由各个功能模块执行的多个任务的指定; 使用数据流图处理输入数据流,所述数据流图包括表示通过表示数据处理组件之间的数据流的链接连接的数据处理组件的节点; 响应于由至少一个数据处理组件提供的至少一个数据流,产生消息流; 并且响应于消息流中的每个消息,使用一个或多个对应的功能模块执行一组一个或多个任务的迭代。

    EVALUATING DATAFLOW GRAPH CHARACTERISTICS
    4.
    发明申请
    EVALUATING DATAFLOW GRAPH CHARACTERISTICS 有权
    评估数据流图表特征

    公开(公告)号:US20120054255A1

    公开(公告)日:2012-03-01

    申请号:US13217778

    申请日:2011-08-25

    IPC分类号: G06F17/11

    CPC分类号: G06F11/3476 G06F9/50

    摘要: One or more expressions are evaluated that represent one or more characteristics of a dataflow graph that includes vertices representing data processing components connected by links representing flows of work elements between the components. A request is received by a computing system to evaluate the one or more expressions that include one or more operations on one or more variables; and the one or more expressions are evaluated by the computing system. The evaluating includes: defining a data structure that includes one or more fields, collecting, during execution of the dataflow graph, tracking information associated with one or more components of the dataflow graph, storing values associated with the tracking information in the one or more fields, and replacing one or more variables of the one or more expressions with the values stored in the one or more fields to compute a result of evaluating the one or more expressions.

    摘要翻译: 评估表示数据流图的一个或多个特征的一个或多个表达式,其包括表示由表示组件之间的工作元素流的链接连接的数据处理组件的顶点。 计算系统接收到请求以评估对一个或多个变量包括一个或多个操作的一个或多个表达式; 并且一个或多个表达式由计算系统评估。 评估包括:定义包括一个或多个字段的数据结构,在执行数据流图时收集与数据流图的一个或多个组件相关联的跟踪信息,将跟踪信息相关联的值存储在一个或多个字段中 并且用存储在一个或多个字段中的值替换一个或多个表达式的一个或多个变量来计算评估一个或多个表达式的结果。

    Fault tolerant batch processing
    5.
    发明授权
    Fault tolerant batch processing 有权
    容错批处理

    公开(公告)号:US08205113B2

    公开(公告)日:2012-06-19

    申请号:US12502851

    申请日:2009-07-14

    IPC分类号: G06F11/00

    摘要: Among other aspects disclosed are a method and system for processing a batch of input data in a fault tolerant manner. The method includes reading a batch of input data including a plurality of records from one or more data sources and passing the batch through a dataflow graph. The dataflow graph includes two or more nodes representing components connected by links representing flows of data between the components. At least one but fewer than all of the components includes a checkpoint process for an action performed for each of multiple units of work associated with one or more of the records. The checkpoint process includes opening a checkpoint buffer stored in non-volatile memory at the start of processing for the batch. For each unit of work from the batch, if a result from performing the action for the unit of work was previously saved in the checkpoint buffer, the saved result is used to complete processing of the unit of work without performing the action again. If a result from performing the action for the unit of work is not saved in the checkpoint buffer. The action is performed to complete processing of the unit of work and the result from performing the action is saved in the checkpoint buffer.

    摘要翻译: 公开的其它方面是用于以容错方式处理一批输入数据的方法和系统。 该方法包括从一个或多个数据源读取一批包括多个记录的输入数据,并将批次传递通过数据流图。 数据流图包括两个或更多个节点,表示通过表示组件之间的数据流的链接连接的组件。 至少一个但少于所有组件包括针对与一个或多个记录相关联的多个工作单元中的每一个执行的动作的检查点过程。 检查点过程包括在批处理开始时打开存储在非易失性存储器中的检查点缓冲区。 对于批次中的每个工作单元,如果执行工作单元的操作的结果先前已保存在检查点缓冲区中,则保存的结果将用于完成对工作单元的处理,而不再执行操作。 如果执行工作单元的操作的结果不会保存在检查点缓冲区中。 执行操作以完成对工作单元的处理,并且执行操作的结果保存在检查点缓冲区中。

    FAULT TOLERANT BATCH PROCESSING
    6.
    发明申请
    FAULT TOLERANT BATCH PROCESSING 有权
    容错批处理

    公开(公告)号:US20110016354A1

    公开(公告)日:2011-01-20

    申请号:US12502851

    申请日:2009-07-14

    IPC分类号: G06F11/00 G06F9/46

    摘要: Among other aspects disclosed are a method and system for processing a batch of input data in a fault tolerant manner. The method includes reading a batch of input data including a plurality of records from one or more data sources and passing the batch through a dataflow graph. The dataflow graph includes two or more nodes representing components connected by links representing flows of data between the components. At least one but fewer than all of the components includes a checkpoint process for an action performed for each of multiple units of work associated with one or more of the records. The checkpoint process includes opening a checkpoint buffer stored in non-volatile memory at the start of processing for the batch. For each unit of work from the batch, if a result from performing the action for the unit of work was previously saved in the checkpoint buffer, the saved result is used to complete processing of the unit of work without performing the action again. If a result from performing the action for the unit of work is not saved in the checkpoint buffer. The action is performed to complete processing of the unit of work and the result from performing the action is saved in the checkpoint buffer.

    摘要翻译: 公开的其它方面是用于以容错方式处理一批输入数据的方法和系统。 该方法包括从一个或多个数据源读取一批包括多个记录的输入数据,并将批次传递通过数据流图。 数据流图包括两个或更多个节点,表示通过表示组件之间的数据流的链接连接的组件。 至少一个但少于所有组件包括针对与一个或多个记录相关联的多个工作单元中的每一个执行的动作的检查点过程。 检查点过程包括在批处理开始时打开存储在非易失性存储器中的检查点缓冲区。 对于批次中的每个工作单元,如果执行工作单元的操作的结果先前已保存在检查点缓冲区中,则保存的结果将用于完成对工作单元的处理,而不再执行操作。 如果执行工作单元的操作的结果不会保存在检查点缓冲区中。 执行操作以完成对工作单元的处理,并且执行操作的结果保存在检查点缓冲区中。

    MAPPING INSTANCES OF A DATASET WITHIN A DATA MANAGEMENT SYSTEM
    7.
    发明申请
    MAPPING INSTANCES OF A DATASET WITHIN A DATA MANAGEMENT SYSTEM 审中-公开
    数据管理系统中数据库的映射实例

    公开(公告)号:US20100138388A1

    公开(公告)日:2010-06-03

    申请号:US12628521

    申请日:2009-12-01

    IPC分类号: G06F17/00 G06F3/048

    摘要: Mapping data stored in a data storage system for use by a computer system includes processing specifications of dataflow graphs that include nodes representing computations interconnected by links representing flows of data. At least one of the dataflow graphs receives a flow of data from at least one input dataset and at least one of the dataflow graphs provides a flow of data to at least one output dataset. A mapper identifies one or more sets of datasets. Each dataset in a given set matches one or more criteria for identifying different versions of a single dataset. A user interface is provided to receive a mapping between at least two datasets in a given set. The mapping received over the user interface is stored in association with a dataflow graph that provides data to or receives data from the datasets of the mapping.

    摘要翻译: 映射存储在数据存储系统中以供计算机系统使用的数据包括数据流图的处理规范,其中包括表示通过表示数据流的链接互连的计算的节点。 至少一个数据流图接收来自至少一个输入数据集的数据流,并且数据流图中的至少一个将数据流提供给至少一个输出数据集。 映射器识别一组或多组数据集。 给定集合中的每个数据集匹配用于标识单个数据集的不同版本的一个或多个标准。 提供用户界面以接收给定集合中的至少两个数据集之间的映射。 通过用户界面接收的映射与数据流图相关联地存储,该数据流向数据提供数据或从映射的数据集接收数据。

    Data Quality Tracking
    8.
    发明申请
    Data Quality Tracking 有权
    数据质量跟踪

    公开(公告)号:US20090319566A1

    公开(公告)日:2009-12-24

    申请号:US12143362

    申请日:2008-06-20

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30958 G06F17/30303

    摘要: In general, a method includes determining metric values associated with data quality for one or more child nodes. Metric values are determined for a parent node based on the metric values of at least some of the child nodes, and relationships between one or more parent nodes and one or more child nodes define a hierarchy. The determination of the metric value for the parent node is repeated for multiple instances.

    摘要翻译: 通常,一种方法包括确定与一个或多个子节点的数据质量相关联的度量值。 基于至少一些子节点的度量值,以及一个或多个父节点与一个或多个子节点之间的关系定义层次结构,为父节点确定度量值。 为多个实例重复父节点的度量值的确定。

    FAULT TOLERANT BATCH PROCESSING
    9.
    发明申请
    FAULT TOLERANT BATCH PROCESSING 有权
    容错批处理

    公开(公告)号:US20120311588A1

    公开(公告)日:2012-12-06

    申请号:US13523422

    申请日:2012-06-14

    IPC分类号: G06F9/46

    摘要: Among other aspects disclosed are a method and system for processing a batch of input data in a fault tolerant manner. The method includes reading a batch of input data including a plurality of records from one or more data sources and passing the batch through a dataflow graph. The dataflow graph includes two or more nodes representing components connected by links representing flows of data between the components. At least one but fewer than all of the components includes a checkpoint process for an action performed for each of multiple units of work associated with one or more of the records. The checkpoint process includes opening a checkpoint buffer stored in non-volatile memory at the start of processing for the batch.

    摘要翻译: 公开的其它方面是用于以容错方式处理一批输入数据的方法和系统。 该方法包括从一个或多个数据源读取一批包括多个记录的输入数据,并将批次传递通过数据流图。 数据流图包括两个或更多个节点,表示通过表示组件之间的数据流的链接连接的组件。 至少一个但少于所有组件包括针对与一个或多个记录相关联的多个工作单元中的每一个执行的动作的检查点过程。 检查点过程包括在批处理开始时打开存储在非易失性存储器中的检查点缓冲区。

    SPECIFYING USER INTERFACE ELEMENTS
    10.
    发明申请
    SPECIFYING USER INTERFACE ELEMENTS 审中-公开
    指定用户界面元素

    公开(公告)号:US20110145748A1

    公开(公告)日:2011-06-16

    申请号:US12959985

    申请日:2010-12-03

    IPC分类号: G06F3/048

    摘要: Providing a user interface for configuring a computer-executable application includes receiving a specification defining: relationships among user interface elements, the relationships based on dependencies between components of a dataflow graph that includes multiple nodes representing components of the dataflow graph and links between the nodes representing flows of data between the components, parameters defining respective characteristics of the components of the dataflow graph, and variables defining respective characteristics of the user interface elements. During operation of a user interface, user interface elements are displayed based on the relationships defined in the specification.

    摘要翻译: 提供用于配置计算机可执行应用程序的用户界面包括接收规范,定义:用户界面元素之间的关系,基于数据流图的组件之间的依赖性的关系,该数据流图包括表示数据流图的组件的多个节点和代表 组件之间的数据流,定义数据流图的组件的相应特征的参数,以及定义用户界面元素的相应特征的变量。 在用户界面的操作期间,基于说明书中定义的关系来显示用户界面元素。