-
公开(公告)号:US20160342657A1
公开(公告)日:2016-11-24
申请号:US15226795
申请日:2016-08-02
Applicant: GOOGLE INC.
Inventor: Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat
IPC: G06F17/30
CPC classification number: G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937
Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
Abstract translation: 一种方法处理数据记录。 该方法将数据记录分成组,并将每个组分配给并行执行的第一多个进程的相应进程。 对于每个组,分配的进程从数据记录中提取信息,应用顺序应用的信息处理命令的脚本以产生中间值,将中间值存储在各自的中间数据结构中,并更新组的状态以指示完成。 当数据记录的预定义阈值百分比完成时,进程将每个组分配给相应的第二个进程作为备份。 当每个组已经由至少一个进程(原始或备份)完成时,该方法执行第二多个进程以从中间数据结构聚合中间值以产生输出数据。 聚合包括每个组只有中间值一次。
-
公开(公告)号:US09830357B2
公开(公告)日:2017-11-28
申请号:US15226795
申请日:2016-08-02
Applicant: GOOGLE INC.
Inventor: Robert C. Pike , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawat
CPC classification number: G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937
Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.
-