Identifying matching canonical documents consistent with visual query structural information
    1.
    发明授权
    Identifying matching canonical documents consistent with visual query structural information 有权
    识别与视觉查询结构信息一致的匹配规范文档

    公开(公告)号:US09087235B2

    公开(公告)日:2015-07-21

    申请号:US14445420

    申请日:2014-07-29

    Applicant: Google Inc.

    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

    Abstract translation: 服务器系统从客户端系统接收视觉查询,在视觉查询上执行光学字符识别(OCR),以产生表示文本字符的文本识别数据,包括视觉查询的连续区域中的多个文本字符。 服务器系统还产生与视觉查询中的文本字符相关联的结构信息。 对多个文字进行文字处理。 该方法还包括根据评分识别一个或多个高质量的文本字符串,每个文本字符串包括来自视觉查询的连续区域中的多个文本字符中的多个高质量文本字符。 检索包含一个或多个高质量文本字符串并与结构信息一致的规范文档。 规范文件的至少一部分被发送到客户端系统。

    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION
    2.
    发明申请
    LARGE LANGUAGE MODELS IN MACHINE TRANSLATION 有权
    机器翻译中的大量语言模型

    公开(公告)号:US20130346059A1

    公开(公告)日:2013-12-26

    申请号:US13709125

    申请日:2012-12-10

    Applicant: GOOGLE INC.

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    Large language models in machine translation
    3.
    发明授权
    Large language models in machine translation 有权
    机器翻译中的大语言模型

    公开(公告)号:US08812291B2

    公开(公告)日:2014-08-19

    申请号:US13709125

    申请日:2012-12-10

    Applicant: Google Inc.

    CPC classification number: G06F17/2818 G06F17/2827 G06F17/2845

    Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n−1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    Abstract translation: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    Identifying Matching Canonical Documents Consistent With Visual Query Structural Information
    4.
    发明申请
    Identifying Matching Canonical Documents Consistent With Visual Query Structural Information 有权
    识别与视觉查询结构信息一致的匹配规范文档

    公开(公告)号:US20140334746A1

    公开(公告)日:2014-11-13

    申请号:US14445420

    申请日:2014-07-29

    Applicant: Google Inc.

    Abstract: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

    Abstract translation: 服务器系统从客户端系统接收视觉查询,在视觉查询上执行光学字符识别(OCR),以产生表示文本字符的文本识别数据,包括视觉查询的连续区域中的多个文本字符。 服务器系统还产生与视觉查询中的文本字符相关联的结构信息。 对多个文字进行文字处理。 该方法还包括根据评分识别一个或多个高质量的文本字符串,每个文本字符串包括来自视觉查询的连续区域中的多个文本字符中的多个高质量文本字符。 检索包含一个或多个高质量文本字符串并与结构信息一致的规范文档。 规范文件的至少一部分被发送到客户端系统。

Patent Agency Ranking