Techniques for pruning phrase tables for statistical machine translation
    1.
    发明授权
    Techniques for pruning phrase tables for statistical machine translation 有权
    修剪用于统计机器翻译的短语表的技术

    公开(公告)号:US08990069B1

    公开(公告)日:2015-03-24

    申请号:US13626982

    申请日:2012-09-26

    Applicant: Google Inc.

    CPC classification number: G06F17/2775 G06F17/2818

    Abstract: A computer-implemented technique includes receiving, at a server including one or more processors, a phrase table for statistical machine translation, the phrase table including a plurality of phrase pairs corresponding to one or more pairs of languages. The technique includes determining, at the server, a redundant set of phrase pairs from the plurality of phrase pairs and calculating first and second probabilities for each specific phrase pair of the redundant set. The second probability can be based on third probabilities for sub-phrases of each specific phrase pair. The technique includes determining, at the server, one or more selected phrase pairs based on whether a corresponding second probability for a specific phrase pair is within a probability threshold from its corresponding first probability. The technique also includes removing, at the server, the one or more selected phrase pairs from the phrase table to obtain a modified phrase table.

    Abstract translation: 计算机实现的技术包括在包括一个或多个处理器的服务器处接收用于统计机器翻译的短语表,所述短语表包括对应于一对或多对语言的多个短语对。 该技术包括在服务器处确定来自多个短语对的冗余组短语对,并为冗余集合的每个特定短语对计算第一和第二概率。 第二概率可以基于每个特定短语对的子短语的第三概率。 该技术包括在服务器处确定一个或多个所选择的短语对,基于特定短语对的对应的第二概率是否在其对应的第一概率之内在概率阈值内。 该技术还包括在服务器处从短语表中移除一个或多个所选择的短语对以获得修改的短语表。

Patent Agency Ranking