Techniques for distributed optical character recognition and distributed machine language translation

    公开(公告)号:US09514377B2

    公开(公告)日:2016-12-06

    申请号:US14264327

    申请日:2014-04-29

    Applicant: Google Inc.

    CPC classification number: G06K9/18 G06F17/289 G06K9/22 G06K9/325 G06K2209/01

    Abstract: A technique for selectively distributing OCR and/or machine language translation tasks between a mobile computing device and server(s) includes receiving, at the mobile computing device, an image of an object comprising a text. The mobile computing device can determine a degree of optical character recognition (OCR) complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or the server(s) can perform OCR to obtain an OCR text. The mobile computing device can then determine a degree of translation complexity for translating the OCR text from its source language to a target language. Based on this degree of translation complexity, the mobile computing device and/or the server(s) can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The mobile computing device can then output the translated OCR text.

    TECHNIQUES FOR DISTRIBUTED OPTICAL CHARACTER RECOGNITION AND DISTRIBUTED MACHINE LANGUAGE TRANSLATION

    公开(公告)号:US20150310291A1

    公开(公告)日:2015-10-29

    申请号:US14264327

    申请日:2014-04-29

    Applicant: GOOGLE INC.

    CPC classification number: G06K9/18 G06F17/289 G06K9/22 G06K9/325 G06K2209/01

    Abstract: A technique for selectively distributing OCR and/or machine language translation tasks between a mobile computing device and server(s) includes receiving, at the mobile computing device, an image of an object comprising a text. The mobile computing device can determine a degree of optical character recognition (OCR) complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or the server(s) can perform OCR to obtain an OCR text. The mobile computing device can then determine a degree of translation complexity for translating the OCR text from its source language to a target language. Based on this degree of translation complexity, the mobile computing device and/or the server(s) can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The mobile computing device can then output the translated OCR text.

    Techniques for pruning phrase tables for statistical machine translation
    4.
    发明授权
    Techniques for pruning phrase tables for statistical machine translation 有权
    修剪用于统计机器翻译的短语表的技术

    公开(公告)号:US08990069B1

    公开(公告)日:2015-03-24

    申请号:US13626982

    申请日:2012-09-26

    Applicant: Google Inc.

    CPC classification number: G06F17/2775 G06F17/2818

    Abstract: A computer-implemented technique includes receiving, at a server including one or more processors, a phrase table for statistical machine translation, the phrase table including a plurality of phrase pairs corresponding to one or more pairs of languages. The technique includes determining, at the server, a redundant set of phrase pairs from the plurality of phrase pairs and calculating first and second probabilities for each specific phrase pair of the redundant set. The second probability can be based on third probabilities for sub-phrases of each specific phrase pair. The technique includes determining, at the server, one or more selected phrase pairs based on whether a corresponding second probability for a specific phrase pair is within a probability threshold from its corresponding first probability. The technique also includes removing, at the server, the one or more selected phrase pairs from the phrase table to obtain a modified phrase table.

    Abstract translation: 计算机实现的技术包括在包括一个或多个处理器的服务器处接收用于统计机器翻译的短语表,所述短语表包括对应于一对或多对语言的多个短语对。 该技术包括在服务器处确定来自多个短语对的冗余组短语对,并为冗余集合的每个特定短语对计算第一和第二概率。 第二概率可以基于每个特定短语对的子短语的第三概率。 该技术包括在服务器处确定一个或多个所选择的短语对,基于特定短语对的对应的第二概率是否在其对应的第一概率之内在概率阈值内。 该技术还包括在服务器处从短语表中移除一个或多个所选择的短语对以获得修改的短语表。

    TECHNIQUES FOR DISTRIBUTED OPTICAL CHARACTER RECOGNITION AND DISTRIBUTED MACHINE LANGUAGE TRANSLATION
    6.
    发明申请
    TECHNIQUES FOR DISTRIBUTED OPTICAL CHARACTER RECOGNITION AND DISTRIBUTED MACHINE LANGUAGE TRANSLATION 有权
    分布式光学字符识别和分布式语言翻译技术

    公开(公告)号:US20150310290A1

    公开(公告)日:2015-10-29

    申请号:US14264296

    申请日:2014-04-29

    Applicant: Google Inc.

    CPC classification number: G06K9/00979 G06F17/289 G06K2209/01

    Abstract: A technique for selectively distributing OCR and/or machine language translation tasks between a mobile computing device and server(s) includes receiving, at the mobile computing device, an image of an object comprising a text. The mobile computing device can determine a degree of optical character recognition (OCR) complexity for obtaining the text from the image. Based on this degree of OCR complexity, the mobile computing device and/or the server(s) can perform OCR to obtain an OCR text. The mobile computing device can then determine a degree of translation complexity for translating the OCR text from its source language to a target language. Based on this degree of translation complexity, the mobile computing device and/or the server(s) can perform machine language translation of the OCR text from the source language to a target language to obtain a translated OCR text. The mobile computing device can then output the translated OCR text.

    Abstract translation: 用于在移动计算设备和服务器之间选择性地分发OCR和/或机器语言翻译任务的技术包括在移动计算设备处接收包括文本的对象的图像。 移动计算设备可以确定从图像中获得文本的光学字符识别(OCR)复杂程度。 基于这种程度的OCR复杂度,移动计算设备和/或服务器可以执行OCR以获得OCR文本。 然后,移动计算设备可以确定将OCR文本从其源语言翻译成目标语言的翻译复杂程度。 基于这种翻译复杂度,移动计算设备和/或服务器可以执行OCR文本从源语言到目标语言的机器语言翻译,以获得翻译的OCR文本。 然后,移动计算设备可以输出翻译的OCR文本。

    Large language models in machine translation
    7.
    发明授权
    Large language models in machine translation 有权
    机器翻译中的大语言模型

    公开(公告)号:US08812291B2

    公开(公告)日:2014-08-19

    申请号:US13709125

    申请日:2012-12-10

    Applicant: Google Inc.

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

Patent Agency Ranking