System and Method for Large-Scale Data Processing Using an Application-Independent Framework

    公开(公告)号:US20170206232A1

    公开(公告)日:2017-07-20

    申请号:US15479228

    申请日:2017-04-04

    Applicant: Google Inc.

    Abstract: A method performs large-scale data processing in a distributed and parallel processing environment. The method defines application-independent map and reduce operations, each invoking one or more library functions that automatically handle data partitioning, parallelization of computations, and fault tolerance. A user specifies a map operation, which calls one or more of the application-independent map operators to perform data read and write operations. A user also specifies a reduce operation, which calls one or more of the application-independent reduce operators to perform data read and write operations. The method executes application-independent map worker processes. Each map worker process executes the user-specified map operation to read designated portions of input files and store intermediate data values in intermediate data structures. The method also executes application-independent reduce worker processes. Each reduce worker process executes the user-specified reduce operation to read intermediate data values from the intermediate data structures and produce final output data.

    Large language models in machine translation
    12.
    发明授权
    Large language models in machine translation 有权
    机器翻译中的大语言模型

    公开(公告)号:US08812291B2

    公开(公告)日:2014-08-19

    申请号:US13709125

    申请日:2012-12-10

    Applicant: Google Inc.

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    System and method for analyzing data records

    公开(公告)号:US09830357B2

    公开(公告)日:2017-11-28

    申请号:US15226795

    申请日:2016-08-02

    Applicant: GOOGLE INC.

    Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.

    System and Method For Large-Scale Data Processing Using an Application-Independent Framework
    14.
    发明申请
    System and Method For Large-Scale Data Processing Using an Application-Independent Framework 有权
    使用独立于应用程序的框架进行大规模数据处理的系统和方法

    公开(公告)号:US20140096138A1

    公开(公告)日:2014-04-03

    申请号:US14099806

    申请日:2013-12-06

    Applicant: Google Inc.

    Abstract: A large-scale data processing system and method for processing data in a distributed and parallel processing environment is disclosed. The system comprises a set of interconnected computing systems, each having one or more processors and memory. The set of interconnected computing systems include: a set of application-independent map modules for reading portions of input files containing data, and for producing intermediate data values by applying at least one user-specified, application-specific map operation to the data; a set of intermediate data structures distributed among a plurality of the interconnected computing systems for storing the intermediate data values; and a set of application-independent reduce modules, distinct from the plurality of application-independent map modules, for producing final output data by applying at least one user-specified, application-specific reduce operation to the intermediate data values.

    Abstract translation: 公开了一种用于在分布式和并行处理环境中处理数据的大规模数据处理系统和方法。 该系统包括一组互连的计算系统,每个系统具有一个或多个处理器和存储器。 所述互连的计算系统的集合包括:用于读取包含数据的输入文件的部分的用于应用的地图模块的集合,以及通过对所述数据应用至少一个用户指定的特定于应用的地图操作来产生中间数据值的集合; 一组中间数据结构,分布在多个互连的计算系统中,用于存储中间数据值; 以及与多个独立于应用的地图模块不同的一组独立于应用的减少模块,用于通过对中间数据值应用至少一个用户指定的特定于应用的减少操作来产生最终的输出数据。

    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION
    15.
    发明申请
    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION 有权
    机器翻译中的大量语言模型

    公开(公告)号:US20130346059A1

    公开(公告)日:2013-12-26

    申请号:US13709125

    申请日:2012-12-10

    Applicant: GOOGLE INC.

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    System and Method For Analyzing Data Records
    16.
    发明申请
    System and Method For Analyzing Data Records 有权
    用于分析数据记录的系统和方法

    公开(公告)号:US20160342657A1

    公开(公告)日:2016-11-24

    申请号:US15226795

    申请日:2016-08-02

    Applicant: GOOGLE INC.

    Abstract: A method processes data records. The method partitions the data records into groups and assigns each group to a respective process of a first plurality of processes, which execute in parallel. For each group, the assigned process extracts information from the data records, applies a script with information processing commands applied sequentially to produce intermediate values, stores the intermediate values in a respective intermediate data structure, and updates the status of the group to indicate completion. When the predefined threshold percentage of the data records are completed, the process assigns each group to a respective second process as a backup. When each of the groups has been completed by at least one process (either the original or the backup), the method executes a second plurality of processes to aggregate intermediate values from the intermediate data structures to produce output data. The aggregation includes intermediate values only once for each group.

    Abstract translation: 一种方法处理数据记录。 该方法将数据记录分成组,并将每个组分配给并行执行的第一多个进程的相应进程。 对于每个组,分配的进程从数据记录中提取信息,应用顺序应用的信息处理命令的脚本以产生中间值,将中间值存储在各自的中间数据结构中,并更新组的状态以指示完成。 当数据记录的预定义阈值百分比完成时,进程将每个组分配给相应的第二个进程作为备份。 当每个组已经由至少一个进程(原始或备份)完成时,该方法执行第二多个进程以从中间数据结构聚合中间值以产生输出数据。 聚合包括每个组只有中间值一次。

Patent Agency Ranking