Management of data and computation in data centers
    2.
    发明授权
    Management of data and computation in data centers 有权
    数据中心的数据和计算管理

    公开(公告)号:US08392403B2

    公开(公告)日:2013-03-05

    申请号:US12562156

    申请日:2009-09-18

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30545 G06F17/30451

    摘要: Data and computation may be unified (i.e., integrated) in a data center using a single query interface. Users may interact with the data center via a query interface to provide a query (i.e., a computation) to the data center. The results of the query may be referred to as derived datasets and may be managed by a cache server. In an implementation, a derived dataset is uniquely referenced by the query that computes it. Shared common computations are computed only once and may be reused by other computations. The result of a query may be computed (if not previously cached) and returned to the user. Infrequently used derived datasets may be garbage collected (e.g., deleted or otherwise removed from storage) by a garbage collector. This integration of data and computation provides efficient resource management for data center.

    摘要翻译: 数据和计算可以使用单个查询接口统一(即集成)在数据中心中。 用户可以经由查询界面与数据中心进行交互,以向数据中心提供查询(即计算)。 查询的结果可以被称为派生数据集,并且可以由缓存服务器管理。 在一个实现中,派生数据集由计算它的查询唯一引用。 共享公共计算仅计算一次,并可能被其他计算重用。 可以计算查询的结果(如果不是先前缓存)并返回给用户。 不经常使用的派生数据集可能被垃圾收集器垃圾收集(例如,删除或以其他方式从存储中移除)。 这种数据和计算的集成为数据中心提供了有效的资源管理。

    General distributed reduction for data parallel computing
    3.
    发明授权
    General distributed reduction for data parallel computing 失效
    数据并行计算的通用分布式减少

    公开(公告)号:US08239847B2

    公开(公告)日:2012-08-07

    申请号:US12406842

    申请日:2009-03-18

    IPC分类号: G06F9/45

    摘要: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.

    摘要翻译: 描述了使用高级计算语言的通用分布式数据并行计算。 以高级语言编写的顺序程序的数据并行部分将自动转换为分布式执行计划。 地图和缩小计算将自动添加到计划中。 顺序程序中的模式可以自动识别,以触发地图和缩小处理。 还提供了直接调用地图和缩小处理。 减少计算的一个或多个部分被推送到地图阶段,并且尽可能地插入动态聚合。 系统自动识别部分缩减和聚合的机会,但也提供了一组用于生成和优化分布式执行计划的高级计算语言的扩展。 扩展包括用于声明适合这些优化的函数的注释。

    MANAGEMENT OF DATA AND COMPUTATION IN DATA CENTERS
    5.
    发明申请
    MANAGEMENT OF DATA AND COMPUTATION IN DATA CENTERS 有权
    数据中心的数据管理和计算

    公开(公告)号:US20110072006A1

    公开(公告)日:2011-03-24

    申请号:US12562156

    申请日:2009-09-18

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30545 G06F17/30451

    摘要: Data and computation may be unified (i.e., integrated) in a data center using a single query interface. Users may interact with the data center via a query interface to provide a query (i.e., a computation) to the data center. The results of the query may be referred to as derived datasets and may be managed by a cache server. In an implementation, a derived dataset is uniquely referenced by the query that computes it. Shared common computations are computed only once and may be reused by other computations. The result of a query may be computed (if not previously cached) and returned to the user. Infrequently used derived datasets may be garbage collected (e.g., deleted or otherwise removed from storage) by a garbage collector. This integration of data and computation provides efficient resource management for data center.

    摘要翻译: 数据和计算可以使用单个查询接口统一(即集成)在数据中心中。 用户可以经由查询界面与数据中心进行交互,以向数据中心提供查询(即计算)。 查询的结果可以被称为派生数据集,并且可以由缓存服务器管理。 在一个实现中,派生数据集由计算它的查询唯一引用。 共享公共计算仅计算一次,并可能被其他计算重用。 可以计算查询的结果(如果不是先前缓存)并返回给用户。 不经常使用的派生数据集可能被垃圾收集器垃圾收集(例如,删除或以其他方式从存储中移除)。 这种数据和计算的集成为数据中心提供了有效的资源管理。