-
公开(公告)号:US20100005080A1
公开(公告)日:2010-01-07
申请号:US12533955
申请日:2009-07-31
申请人: ROBERT C. PIKE , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawar
发明人: ROBERT C. PIKE , Sean Quinlan , Sean M. Dorward , Jeffrey Dean , Sanjay Ghemawar
IPC分类号: G06F17/30
CPC分类号: G06F17/30501 , G06F11/1482 , G06F17/30545 , G06F17/30598 , Y10S707/99933 , Y10S707/99937
摘要: A method and system for analyzing data records includes allocating groups of records to respective processes of a first plurality of processes executing in parallel. In each respective process of the first plurality of processes, for each record in the group of records allocated to the respective process, a query is applied to the record so as to produce zero or more values. Zero or more emit operators are applied to each of the zero or more produced values so as to add corresponding information to an intermediate data structure. Information from a plurality of the intermediate data structures is aggregated to produce output data.
摘要翻译: 用于分析数据记录的方法和系统包括:将记录组分配给并行执行的第一多个进程的各个进程。 在第一多个处理的每个相应处理中,对于分配给相应处理的记录组中的每个记录,将对该记录应用查询以产生零个或多个值。 将零个或更多个发射操作符应用于零或更多产生的值中的每一个,以便将相应的信息添加到中间数据结构。 来自多个中间数据结构的信息被聚合以产生输出数据。