Parallel processing of data
    1.
    发明授权
    Parallel processing of data 有权
    并行处理数据

    公开(公告)号:US09536014B1

    公开(公告)日:2017-01-03

    申请号:US14922552

    申请日:2015-10-26

    Applicant: Google Inc.

    Abstract: Parallel processing of data may include a set of map processes and a set of reduce processes. Each map process may include at least one map thread. Map threads may access distinct input data blocks assigned to the map process, and may apply an application specific map operation to the input data blocks to produce key-value pairs. Each map process may include a multiblock combiner configured to apply a combining operation to values associated with common keys in the key-value pairs to produce combined values, and to output intermediate data including pairs of keys and combined values. Each reduce process may be configured to access the intermediate data output by the multiblock combiners. For each key, an application specific reduce operation may be applied to the combined values associated with the key to produce output data.

    Abstract translation: 数据的并行处理可以包括一组地图处理和一组缩减过程。 每个地图过程可以包括至少一个地图线程。 映射线程可以访问分配给映射过程的不同输入数据块,并且可以将应用特定映射操作应用于输入数据块以产生键值对。 每个映射过程可以包括多块组合器,其被配置为将组合操作应用于与键值对中的公共密钥相关联的值以产生组合值,以及输出包括密钥对和组合值的中间数据。 每个减少处理可以被配置为访问由多块组合器输出的中间数据。 对于每个密钥,可以将应用特定的减少操作应用于与密钥相关联的组合值以产生输出数据。

    Cataloging data sets for reuse in pipeline applications
    2.
    发明授权
    Cataloging data sets for reuse in pipeline applications 有权
    编制数据集,以便在管道应用中重复使用

    公开(公告)号:US09495207B1

    公开(公告)日:2016-11-15

    申请号:US14326953

    申请日:2014-07-09

    Applicant: Google Inc.

    CPC classification number: G06F9/546 G06F9/544

    Abstract: The present disclosure relates to cataloging data sets for reuse in pipeline applications. One example method includes identifying a data set produced by a particular pipeline object included in a first pipeline instance, the first pipeline instance including a plurality of pipeline objects, each pipeline object configured to perform a computation, and the particular pipeline object configured to perform a particular computation; determining a set of metadata for the data set, the set of metadata including identifying information for the data set to identify the data set to pipeline instances separate from the first pipeline instance; and allowing pipeline instances separate from the first pipeline instance to retrieve the data set based at least in part on the set of metadata, wherein the pipeline instances avoid performing the particular computation by using the retrieved data set.

    Abstract translation: 本公开涉及用于在管道应用中重用的数据集的编目。 一个示例性方法包括识别由包括在第一流水线实例中的特定流水线对象产生的数据集,第一流水线实例包括多个流水线对象,每个流水线对象被配置为执行计算,以及特定流水线对象被配置为执行 特殊计算; 确定所述数据集的一组元数据,所述一组元数据包括用于识别与第一流水线实例分开的流水线实例的数据集的数据集的标识信息; 以及允许流水线实例与第一流水线实例分离以至少部分地基于元数据集来检索数据集,其中流水线实例避免通过使用所检索的数据集执行特定的计算。

    Managing metadata for a distributed processing system with manager agents and worker agents
    3.
    发明授权
    Managing metadata for a distributed processing system with manager agents and worker agents 有权
    管理代理商和工作代理程序为分布式处理系统管理元数据

    公开(公告)号:US09424083B2

    公开(公告)日:2016-08-23

    申请号:US14211660

    申请日:2014-03-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus are described for managing metadata for a distributed processing system. In one aspect, a method includes receiving a computation specification that includes a set of grains that specifies an initial state for a computation that is to be performed by a distributed processing system, wherein each grain comprises metadata that specifies a portion of the initial state for the computation; storing a grain hierarchy that represents a state for the computation based on a grain type associated with each grain, the grain hierarchy comprising subscription grains for subscriptions for the grain hierarchy, each subscription corresponding to one or more grains included in the grain hierarchy, and each subscription specifying one or more actions to be performed by the hub device; and performing, during performance of the computation, at least one of the actions specified by at least one of the subscriptions.

    Abstract translation: 描述了用于管理分布式处理系统的元数据的方法,系统和装置。 一方面,一种方法包括接收计算规范,该计算规范包括指定要由分布式处理系统执行的计算的初始状态的一组粒子,其中每个粒子包括指定初始状态的一部分的元数据, 计算; 存储基于与每个粒度相关联的粒度类型的用于计算的状态的粒度层次,所述粒度层次包括用于粒度层次结构的订阅的订阅粒子,对应于包括在粒度层级中的一个或多个粒子的每个订阅,以及每个 指定要由所述集线器设备执行的一个或多个动作; 以及在执行所述计算期间执行至少一个所述订阅指定的动作中的至少一个。

    Dynamic Shard Allocation Adjustment
    5.
    发明申请
    Dynamic Shard Allocation Adjustment 有权
    动态碎片分配调整

    公开(公告)号:US20160011901A1

    公开(公告)日:2016-01-14

    申请号:US14327338

    申请日:2014-07-09

    Applicant: Google Inc.

    CPC classification number: G06F9/46 G06F9/4843

    Abstract: The present disclosure relates to dynamically adjusting shard allocation during parallel processing operations. One example method includes determining a target completion time for a batch data processing job of an input data set performed by a plurality of tasks, each of the plurality of tasks processing a different input shard including a different portion of the input data set; identifying a first task having an estimated completion time greater than the target completion time of the batch data processing job; and splitting the first input shard into a first split input shard and a second split input shard different from the first split input shard, the first split input shard including a first portion of the first input shard, and the second split input shard including a second portion of the first input shard different from the first portion.

    Abstract translation: 本公开涉及在并行处理操作期间动态地调整分片分配。 一个示例性方法包括确定由多个任务执行的输入数据集的批量数据处理作业的目标完成时间,所述多个任务中的每一个处理包括所述输入数据集的不同部分的不同输入分片; 识别具有大于批量数据处理作业的目标完成时间的估计完成时间的第一任务; 以及将所述第一输入碎片分割成与所述第一分割输入分片不同的第一分割输入分片和第二分割输入分片,所述第一分割输入分片包括所述第一输入分片的第一部分,所述第二分割输入分片包括第二分割输入分片 所述第一输入片与所述第一部分不同的部分。

    DYNAMIC INTENT REGISTRY
    6.
    发明申请

    公开(公告)号:US20170177738A1

    公开(公告)日:2017-06-22

    申请号:US14976994

    申请日:2015-12-21

    Applicant: Google Inc.

    CPC classification number: G06F16/9024 G06F16/35

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for relating the operation of applications on a user device are described including accessing, for a first application, intent data describing intent groups for the first application, each intent group including one or more intents that belong to the intent group, determining enabling connectors for the intent groups, each enabling connector specifying an enabling intent that causes a corresponding intent group to become active in the first application, generating intent group association data that associates intent groups to other intent groups by enabling connectors, wherein an enabling intent associates a first intent group to a second intent group, and the second intent group becomes active in response to an execution of the enabling connector, and providing the intent group association data to a user device that has the first application installed.

    FILE OPERATION TASK OPTIMIZATION
    7.
    发明申请
    FILE OPERATION TASK OPTIMIZATION 审中-公开
    文件操作任务优化

    公开(公告)号:US20170004010A1

    公开(公告)日:2017-01-05

    申请号:US15266177

    申请日:2016-09-15

    Applicant: Google Inc.

    CPC classification number: G06F9/4881 G06F9/4887 G06F16/16 G06F16/182

    Abstract: A method includes receiving, by a data processing apparatus, a plurality of file operation requests, each file operation request including a priority, a deadline, and an operation type and representing a request to perform an operation on at least one file maintained in a distributed file system; identifying, by the data processing apparatus, a group of file operation requests to be executed together from the plurality of file operation requests, the identification based at least in part on at least one of: the file operations in the group of file operations being directed to a same storage system, or file operations in the group of file operations sharing a common operation type; and sending a request to execute the group of file operation requests to a system configured to perform the group of file operation requests.

    Abstract translation: 一种方法包括:由数据处理装置接收多个文件操作请求,每个文件操作请求包括优先级,最后期限和操作类型,并且表示对维护在分发的文件中的至少一个文件执行操作的请求 文件系统; 由所述数据处理装置识别要从所述多个文件操作请求一起执行的一组文件操作请求,所述标识至少部分地基于以下中的至少一个:所述文件操作组中的文件操作被定向 到同一个存储系统,或文件操作组中的文件操作共享一个常用的操作类型; 以及向被配置为执行所述一组文件操作请求的系统发送执行所述一组文件操作请求的请求。

    MANAGING METADATA FOR A DISTRIBUTED PROCESSING SYSTEM WITH MANAGER AGENTS AND WORKER AGENTS
    8.
    发明申请
    MANAGING METADATA FOR A DISTRIBUTED PROCESSING SYSTEM WITH MANAGER AGENTS AND WORKER AGENTS 审中-公开
    管理经纪人和工作代理人分配处理系统的元数据管理

    公开(公告)号:US20160357613A1

    公开(公告)日:2016-12-08

    申请号:US15240785

    申请日:2016-08-18

    Applicant: Google Inc.

    Abstract: A manager agent access a grain hierarchy that represents a state for a computation that is to performed by a distributed processing system, wherein the grain hierarchy includes manager agent grains including metadata for manager agent processes that manage the performance of the computation by the distributed processing system, and worker agent grains including, for tasks to be performed by the distributed processing system, metadata for worker agents that each correspond to a subset of the plurality of data processors for performing the task. A manager agent performs processes defined by a manager agent grain to manage the computation by worker agents and storing, within the grain, metadata describing the manager agent process performed by the manager agent, and worker agents perform tasks assigned to the worker agents based on an assignment of a respective worker agent grain to the worker agent.

    Abstract translation: 管理者代理访问表示要由分布式处理系统执行的计算的状态的粒度层次,其中,粒度层次结构包括管理代理粒子,包括管理代理进程的元数据,管理者代理进程管理由分布式处理系统执行计算 以及工作代理粒子,包括对于由分布式处理系统执行的任务,每个对应于用于执行任务的多个数据处理器的子集的工作代理的元数据。 管理者代理执行由管理器代理粒子定义的进程以由工作代理管理计算,并且在谷物内存储描述由管理器代理执行的管理器代理进程的元数据,并且工作代理基于 将工作人员粮食分配给工人代理。

    File operation task optimization
    9.
    发明授权
    File operation task optimization 有权
    文件操作任务优化

    公开(公告)号:US09449018B1

    公开(公告)日:2016-09-20

    申请号:US14089588

    申请日:2013-11-25

    Applicant: Google, Inc.

    CPC classification number: G06F9/4881 G06F9/4887 G06F17/30115 G06F17/30194

    Abstract: A method includes receiving, by a data processing apparatus, a plurality of file operation requests, each file operation request including a priority, a deadline, and an operation type and representing a request to perform an operation on at least one file maintained in a distributed file system; identifying, by the data processing apparatus, a group of file operation requests to be executed together from the plurality of file operation requests, the identification based at least in part on at least one of: the file operations in the group of file operations being directed to a same storage system, or file operations in the group of file operations sharing a common operation type; and sending a request to execute the group of file operation requests to a system configured to perform the group of file operation requests.

    Abstract translation: 一种方法包括:由数据处理装置接收多个文件操作请求,每个文件操作请求包括优先级,最后期限和操作类型,并且表示对维护在分发的文件中的至少一个文件执行操作的请求 文件系统; 由所述数据处理装置识别要从所述多个文件操作请求一起执行的一组文件操作请求,所述标识至少部分地基于以下中的至少一个:所述文件操作组中的文件操作被定向 到同一个存储系统,或文件操作组中的文件操作共享一个常用的操作类型; 以及向被配置为执行所述一组文件操作请求的系统发送执行所述一组文件操作请求的请求。

    MANAGING METADATA FOR A DISTRIBUTED PROCESSING SYSTEM
    10.
    发明申请
    MANAGING METADATA FOR A DISTRIBUTED PROCESSING SYSTEM 有权
    管理分布式处理系统的元数据

    公开(公告)号:US20150261570A1

    公开(公告)日:2015-09-17

    申请号:US14211660

    申请日:2014-03-14

    Applicant: Google Inc.

    Abstract: Methods, systems, and apparatus are described for managing metadata for a distributed processing system. In one aspect, a method includes receiving a computation specification that includes a set of grains that specifies an initial state for a computation that is to be performed by a distributed processing system, wherein each grain comprises metadata that specifies a portion of the initial state for the computation; storing a grain hierarchy that represents a state for the computation based on a grain type associated with each grain, the grain hierarchy comprising subscription grains for subscriptions for the grain hierarchy, each subscription corresponding to one or more grains included in the grain hierarchy, and each subscription specifying one or more actions to be performed by the hub device; and performing, during performance of the computation, at least one of the actions specified by at least one of the subscriptions.

    Abstract translation: 描述了用于管理分布式处理系统的元数据的方法,系统和装置。 一方面,一种方法包括接收计算规范,该计算规范包括指定要由分布式处理系统执行的计算的初始状态的一组粒子,其中每个粒子包括指定初始状态的一部分的元数据, 计算; 存储表示基于与每个颗粒相关联的颗粒类型的用于计算的状态的颗粒层次,所述颗粒层级包括用于颗粒层次结构的订阅的订阅颗粒,对应于包括在颗粒层次中的一个或多个颗粒的每个订阅,以及每个 指定要由所述集线器设备执行的一个或多个动作; 以及在执行所述计算期间执行至少一个所述订阅指定的动作中的至少一个。

Patent Agency Ranking