COST-AWARE REPLICATION OF INTERMEDIATE DATA IN DATAFLOWS
    1.
    发明申请
    COST-AWARE REPLICATION OF INTERMEDIATE DATA IN DATAFLOWS 有权
    数据流中的中间数据的成本评估

    公开(公告)号:US20120278578A1

    公开(公告)日:2012-11-01

    申请号:US13097200

    申请日:2011-04-29

    IPC分类号: G06F12/02

    摘要: Described herein are methods, systems, apparatuses and products for cost-aware replication of intermediate data in dataflows. An aspect provides receiving at least one measurement indicative of a reliability cost associated with executing a dataflow; computing a degree of replication of at least one intermediate data set in the dataflow based on the reliability cost; and communicating at least one replication factor to at least one component of a system responsible for replication of the at least one intermediate data set in the dataflow; wherein the at least one intermediate data set is replicated according to the replication factor. Other embodiments are disclosed.

    摘要翻译: 这里描述了用于数据流中的中间数据的成本感知复制的方法,系统,装置和产品。 一方面提供接收指示与执行数据流相关联的可靠性成本的至少一个测量; 基于可靠性成本计算数据流中至少一个中间数据集的复制程度; 以及将至少一个复制因子传送到负责所述数据流中所述至少一个中间数据集的复制的系统的至少一个组件; 其中所述至少一个中间数据集根据复制因素被复制。 公开了其他实施例。

    Cost-aware replication of intermediate data in dataflows
    2.
    发明授权
    Cost-aware replication of intermediate data in dataflows 有权
    数据流中中间数据的成本感知复制

    公开(公告)号:US08949558B2

    公开(公告)日:2015-02-03

    申请号:US13097200

    申请日:2011-04-29

    摘要: Described herein are methods, systems, apparatuses and products for cost-aware replication of intermediate data in dataflows. An aspect provides receiving at least one measurement indicative of a reliability cost associated with executing a dataflow; computing a degree of replication of at least one intermediate data set in the dataflow based on the reliability cost; and communicating at least one replication factor to at least one component of a system responsible for replication of the at least one intermediate data set in the dataflow; wherein the at least one intermediate data set is replicated according to the replication factor. Other embodiments are disclosed.

    摘要翻译: 这里描述了用于数据流中的中间数据的成本感知复制的方法,系统,装置和产品。 一方面提供接收指示与执行数据流相关联的可靠性成本的至少一个测量; 基于可靠性成本计算数据流中至少一个中间数据集的复制程度; 以及将至少一个复制因子传送到负责所述数据流中所述至少一个中间数据集的复制的系统的至少一个组件; 其中所述至少一个中间数据集根据复制因素被复制。 公开了其他实施例。

    Distribution of intermediate data in a multistage computer application
    5.
    发明授权
    Distribution of intermediate data in a multistage computer application 失效
    在多级计算机应用程序中分配中间数据

    公开(公告)号:US07970884B1

    公开(公告)日:2011-06-28

    申请号:US12684273

    申请日:2010-01-08

    IPC分类号: G06F15/173

    CPC分类号: G06F9/50

    摘要: A method, system and computer program product for distributing intermediate data of a multistage computer application to a plurality of computers. In one embodiment, a data manager calculates data usage demand of generated intermediate data. A computer manager calculates a computer usage, which is the sum of all data usage demand of each stored intermediate data at the computer. A scheduler selects a target computer from the plurality of computers for storage of the generated intermediate data at such that a variance of the computer usage demand across the plurality of computers is minimized.

    摘要翻译: 一种用于将多级计算机应用的中间数据分发到多个计算机的方法,系统和计算机程序产品。 在一个实施例中,数据管理器计算所生成的中间数据的数据使用需求。 计算机管理器计算计算机使用量,其是计算机上每个存储的中间数据的所有数据使用需求的总和。 调度器从多个计算机中选择目标计算机以存储生成的中间数据,使得跨越多个计算机的计算机使用需求的变化最小化。

    I/O performance of data analytic workloads
    6.
    发明授权
    I/O performance of data analytic workloads 有权
    数据分析工作负载的I / O性能

    公开(公告)号:US08560779B2

    公开(公告)日:2013-10-15

    申请号:US13112864

    申请日:2011-05-20

    IPC分类号: G06F12/00

    摘要: A method and structure for processing an application program on a computer. In a memory of the computer executing the application, an in-memory cache structure is provided for normally temporarily storing data produced in the processing. An in-memory storage outside the in-memory cache structure is provided in the memory, for by-passing the in-memory cache structure for temporarily storing data under a predetermined condition. A sensor detects an amount of usage of the in-memory cache structure used to store data during the processing. When it is detected that the amount of usage exceeds the predetermined threshold, the processing is controlled so that the data produced in the processing is stored in the in-memory storage rather than in the in-memory cache structure.

    摘要翻译: 一种用于在计算机上处​​理应用程序的方法和结构。 在执行应用的计算机的存储器中,提供了用于正常临时存储在处理中产生的数据的内存中高速缓存结构。 在存储器中提供存储器内部高速缓存结构外部的存储器内存储,用于在预定条件下旁路用于临时存储数据的存储器内缓存结构。 传感器检测在处理期间用于存储数据的内存中缓存结构的使用量。 当检测到使用量超过预定阈值时,控制处理,使得在处理中产生的数据被存储在存储器内存储而不是存储器内缓存结构中。

    Execution of dataflow jobs
    7.
    发明授权
    Execution of dataflow jobs 有权
    执行数据流作业

    公开(公告)号:US08539192B2

    公开(公告)日:2013-09-17

    申请号:US12684343

    申请日:2010-01-08

    IPC分类号: G06F12/00

    摘要: A method, system and computer program product for storing data in memory. An example system includes at least one multistage application configured to generate intermediate data in a generating stage of the application and consume the intermediate data in a subsequent consuming stage of the application. A runtime profiler is configured to monitor the application's execution and dynamically allocate memory to the application from an in-memory data grid.

    摘要翻译: 一种用于将数据存储在存储器中的方法,系统和计算机程序产品。 示例性系统包括至少一个多级应用,其被配置为在应用的生成阶段中生成中间数据,并在应用的后续消费阶段消耗中间数据。 运行时分析器被配置为监视应用程序的执行情况,并从内存数据网格动态地向应用程序分配内存。

    SYSTEM AND METHOD TO IMPROVE I/O PERFORMANCE OF DATA ANALYTIC WORKLOADS
    8.
    发明申请
    SYSTEM AND METHOD TO IMPROVE I/O PERFORMANCE OF DATA ANALYTIC WORKLOADS 有权
    提高数据分析工作量I / O性能的系统和方法

    公开(公告)号:US20120297145A1

    公开(公告)日:2012-11-22

    申请号:US13112864

    申请日:2011-05-20

    IPC分类号: G06F12/08

    摘要: A method and structure for processing an application program on a computer. In a memory of the computer executing the application, an in-memory cache structure is provided for normally temporarily storing data produced in the processing. An in-memory storage outside the in-memory cache structure is provided in the memory, for by-passing the in-memory cache structure for temporarily storing data under a predetermined condition. A sensor detects an amount of usage of the in-memory cache structure used to store data during the processing. When it is detected that the amount of usage exceeds the predetermined threshold, the processing is controlled so that the data produced in the processing is stored in the in-memory storage rather than in the in-memory cache structure.

    摘要翻译: 一种用于在计算机上处​​理应用程序的方法和结构。 在执行应用的计算机的存储器中,提供了用于正常临时存储在处理中产生的数据的内存中高速缓存结构。 在存储器中提供存储器内部高速缓存结构外部的存储器内存储,用于在预定条件下旁路用于临时存储数据的存储器内缓存结构。 传感器检测在处理期间用于存储数据的内存中缓存结构的使用量。 当检测到使用量超过预定阈值时,控制处理,使得在处理中产生的数据被存储在存储器内存储而不是存储器内缓存结构中。

    EXECUTION OF DATAFLOW JOBS
    9.
    发明申请
    EXECUTION OF DATAFLOW JOBS 有权
    数据流程执行

    公开(公告)号:US20110173410A1

    公开(公告)日:2011-07-14

    申请号:US12684343

    申请日:2010-01-08

    IPC分类号: G06F12/02

    摘要: A method, system and computer program product for storing data in memory. An example system includes at least one multistage application configured to generate intermediate data in a generating stage of the application and consume the intermediate data in a subsequent consuming stage of the application. A runtime profiler is configured to monitor the application's execution and dynamically allocate memory to the application from an in-memory data grid.

    摘要翻译: 一种用于将数据存储在存储器中的方法,系统和计算机程序产品。 示例性系统包括至少一个多级应用,其被配置为在应用的生成阶段中生成中间数据,并在应用的后续消费阶段消耗中间数据。 运行时分析器被配置为监视应用程序的执行情况,并从内存数据网格动态地向应用程序分配内存。

    DISTRIBUTION OF INTERMEDIATE DATA IN A MULTISTAGE COMPUTER APPLICATION
    10.
    发明申请
    DISTRIBUTION OF INTERMEDIATE DATA IN A MULTISTAGE COMPUTER APPLICATION 失效
    在多计算机应用中分布中间数据

    公开(公告)号:US20110173245A1

    公开(公告)日:2011-07-14

    申请号:US12684273

    申请日:2010-01-08

    IPC分类号: G06F15/16

    CPC分类号: G06F9/50

    摘要: A method, system and computer program product for distributing intermediate data of a multistage computer application to a plurality of computers. In one embodiment, a data manager calculates data usage demand of generated intermediate data. A computer manager calculates a computer usage, which is the sum of all data usage demand of each stored intermediate data at the computer. A scheduler selects a target computer from the plurality of computers for storage of the generated intermediate data at such that a variance of the computer usage demand across the plurality of computers is minimized.

    摘要翻译: 一种用于将多级计算机应用的中间数据分发到多个计算机的方法,系统和计算机程序产品。 在一个实施例中,数据管理器计算所生成的中间数据的数据使用需求。 计算机管理器计算计算机使用量,其是计算机上每个存储的中间数据的所有数据使用需求的总和。 调度器从多个计算机中选择目标计算机以存储生成的中间数据,使得跨越多个计算机的计算机使用需求的变化最小化。