System and method for editing electronic images
    1.
    发明授权
    System and method for editing electronic images 有权
    用于编辑电子图像的系统和方法

    公开(公告)号:US06903751B2

    公开(公告)日:2005-06-07

    申请号:US10104805

    申请日:2002-03-22

    CPC分类号: G06T11/60

    摘要: A graphical input and display system for creating and manipulating electronic images includes input devices permitting a user to manipulate elements of electronic images received from various image input sources. A processor, connected to the system, receives requests for various image editing operations and also accesses a memory structure. The system memory structure includes a user interaction module, which allows a user to enter new image material or select and modify existing image material to form primary image objects, as well as a grouping module, which maintains an unrestricted grouping structure, an output module, and data memory.

    摘要翻译: 用于创建和操纵电子图像的图形输入和显示系统包括允许用户操纵从各种图像输入源接收的电子图像的元素的输入装置。 连接到系统的处理器接收对各种图像编辑操作的请求,并且还访问存储器结构。 系统存储器结构包括用户交互模块,其允许用户输入新的图像材料或选择和修改现有图像材料以形成主要图像对象,以及分组模块,其维持非限制性分组结构,输出模块, 和数据存储器。

    Large Language Models in Machine Translation
    2.
    发明申请
    Large Language Models in Machine Translation 有权
    机器翻译中的大语言模型

    公开(公告)号:US20080243481A1

    公开(公告)日:2008-10-02

    申请号:US11767436

    申请日:2007-06-22

    IPC分类号: G06F17/27

    摘要: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    摘要翻译: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有对应的相对频率,并且n阶对应于n-gram中的令牌数量,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    Identifying Matching Canonical Documents in Response to a Visual Query and in Accordance with Geographic Information
    3.
    发明申请
    Identifying Matching Canonical Documents in Response to a Visual Query and in Accordance with Geographic Information 有权
    识别符合视觉查询并符合地理信息的匹配规范文件

    公开(公告)号:US20120134590A1

    公开(公告)日:2012-05-31

    申请号:US13309484

    申请日:2011-12-01

    IPC分类号: G06K9/18

    摘要: A server system receives a visual query from a client system distinct from the server system. The server system performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system scores each textual character in the plurality of textual characters in accordance with the geographic location of the client system. The server system identifies, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. Then the server system retrieves a canonical document having the one or more high quality textual strings and sends at least a portion of the canonical document to the client system.

    摘要翻译: 服务器系统从与服务器系统不同的客户端系统接收可视化查询。 服务器系统在视觉查询上执行光学字符识别(OCR)以产生表示文本字符的文本识别数据,包括视觉查询的连续区域中的多个文本字符。 服务器系统根据客户端系统的地理位置对多个文本字符中的每个文本字符进行分数。 服务器系统根据评分识别一个或多个高质量的文本字符串,每个文本字符串包括来自视觉查询的连续区域中的多个文本字符中的多个高质量文本字符。 然后,服务器系统检索具有一个或多个高质量文本字符串的规范文档,并将规范文档的至少一部分发送给客户端系统。

    PARALLEL DOCUMENT MINING
    4.
    发明申请
    PARALLEL DOCUMENT MINING 审中-公开
    并行文件采矿

    公开(公告)号:US20120047172A1

    公开(公告)日:2012-02-23

    申请号:US13214941

    申请日:2011-08-22

    IPC分类号: G06F17/30

    CPC分类号: G06F17/2827 G06F16/30

    摘要: A technique includes providing a collection of documents in multiple languages, identifying, from the collection of documents, a group of candidate documents, where each candidate document in the group shares multiple corresponding rare features, evaluating pairs of candidate documents in the group using multiple common features present in the collection of documents, and determining, based on evaluating the pairs of candidate documents, whether each pair of candidate documents corresponds to a translated pair of documents.

    摘要翻译: 一种技术包括提供多种语言的文档集合,从文档的收集中识别一组候选文件,其中组中的每个候选文档共享多个对应的稀有特征,使用多个共同的方法评估该组中候选文档的对 在文件收集中存在的特征,以及基于评估候选文件对来确定每对候选文档是否对应于已翻译的一对文档。

    Compound Splitting
    5.
    发明申请
    Compound Splitting 有权
    复合分裂

    公开(公告)号:US20110202330A1

    公开(公告)日:2011-08-18

    申请号:US13026936

    申请日:2011-02-14

    IPC分类号: G06F17/28 G06F17/27

    CPC分类号: G06F17/2755

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for decompounding compound words are disclosed. In one aspect, a method includes obtaining a token that includes a sequence of characters, identifying two or more candidate sub-words that are constituents of the token, and one or more morphological operations that are required to transform the sub-words into the token, where at least one of the morphological operations involves a use of a non-dictionary word, and determining a cost associated with each sub-word and a cost associated with each morphological operation.

    摘要翻译: 公开了包括在计算机存储介质上编码的用于分解复合词的计算机程序的方法,系统和装置。 在一个方面,一种方法包括获得包括字符序列的标记,识别作为令牌的组成部分的两个或更多候选子字,以及将子字变换成令牌所需的一个或多个形态操作 其中至少一个形态操作涉及使用非词典单词,并且确定与每个子单词相关联的成本以及与每个形态操作相关联的成本。

    Document image decoding systems and methods using modified stack algorithm
    6.
    发明授权
    Document image decoding systems and methods using modified stack algorithm 有权
    文档图像解码系统和方法采用修改堆栈算法

    公开(公告)号:US07167588B2

    公开(公告)日:2007-01-23

    申请号:US10215041

    申请日:2002-08-09

    IPC分类号: G06K9/72

    CPC分类号: G06K9/6297

    摘要: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

    摘要翻译: 包含Stack算法的文档图像解码方法和系统改进了文档图像解码。 迭代算法的应用被迭代以改进解码。 为部分路径确定临时权重以减少模板匹配。 此外,确定语义上等效的假设以减少冗余假设。

    Document image decoding systems and methods using modified stack algorithm
    7.
    发明授权
    Document image decoding systems and methods using modified stack algorithm 有权
    文档图像解码系统和方法采用修改堆栈算法

    公开(公告)号:US07039240B2

    公开(公告)日:2006-05-02

    申请号:US10215090

    申请日:2002-08-09

    IPC分类号: G06K9/72

    CPC分类号: G06K9/6297

    摘要: Methods and systems for document image decoding incorporating a Stack algorithm improve document image decoding. The application of the Stack algorithm is iterated to improve decoding. A provisional weight is determined for a partial path to reduce template matching. In addition, semantically equivalent hypotheses are identified to reduce redundant hypotheses.

    摘要翻译: 包含Stack算法的文档图像解码方法和系统改进了文档图像解码。 迭代算法的应用被迭代以改进解码。 为部分路径确定临时权重以减少模板匹配。 此外,确定语义上等效的假设以减少冗余假设。

    Identifying matching canonical documents consistent with visual query structural information
    8.
    发明授权
    Identifying matching canonical documents consistent with visual query structural information 有权
    识别与视觉查询结构信息一致的匹配规范文档

    公开(公告)号:US08811742B2

    公开(公告)日:2014-08-19

    申请号:US13309471

    申请日:2011-12-01

    IPC分类号: G06K9/62

    摘要: A server system receives a visual query from a client system, performs optical character recognition (OCR) on the visual query to produce text recognition data representing textual characters, including a plurality of textual characters in a contiguous region of the visual query. The server system also produces structural information associated with the textual characters in the visual query. Textual characters in the plurality of textual characters are scored. The method further includes identifying, in accordance with the scoring, one or more high quality textual strings, each comprising a plurality of high quality textual characters from among the plurality of textual characters in the contiguous region of the visual query. A canonical document that includes the one or more high quality textual strings and that is consistent with the structural information is retrieved. At least a portion of the canonical document is sent to the client system.

    摘要翻译: 服务器系统从客户端系统接收视觉查询,在视觉查询上执行光学字符识别(OCR),以产生表示文本字符的文本识别数据,包括视觉查询的连续区域中的多个文本字符。 服务器系统还产生与视觉查询中的文本字符相关联的结构信息。 对多个文字进行文字处理。 该方法还包括根据评分识别一个或多个高质量的文本字符串,每个文本字符串包括来自视觉查询的连续区域中的多个文本字符中的多个高质量文本字符。 检索包含一个或多个高质量文本字符串并与结构信息一致的规范文档。 规范文件的至少一部分被发送到客户端系统。

    Large language models in machine translation
    9.
    发明授权
    Large language models in machine translation 有权
    机器翻译中的大语言模型

    公开(公告)号:US08332207B2

    公开(公告)日:2012-12-11

    申请号:US11767436

    申请日:2007-06-22

    IPC分类号: G06F17/27

    摘要: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n-1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

    摘要翻译: 提供了用于机器翻译的系统,方法和计算机程序产品。 在一些实现中,提供了一种系统。 该系统包括语言模型,其包括来自语料库的n-gram的集合,每个n-gram在语料库中具有相应的相对频率,并且n个对应于n-gram中的令牌数量的次序n,每个n-gram对应 到具有n-1级的退避n-gram和回退分数的集合,与n-gram相关联的每个回退分数,作为退避因子的函数确定的退避分数和相应退避n的相对频率 -gram在语料库中。

    Document image decoding using an integrated stochastic language model
    10.
    发明授权
    Document image decoding using an integrated stochastic language model 有权
    使用综合随机语言模型进行文档图像解码

    公开(公告)号:US06678415B1

    公开(公告)日:2004-01-13

    申请号:US09570730

    申请日:2000-05-12

    IPC分类号: G06K962

    CPC分类号: G06K9/72 G06K2209/01

    摘要: A text recognition system represents the decoded message of a document image as a path through an image network. A method for integrating a language model into the network selectively expands the network to accommodate the language model only for certain ones of the paths in the network, effectively managing the memory storage requirements and computational complexities of integrating the language model efficiently into the network. The language model generates probability distributions indicating the probability of a certain character occurring in a string, given one or more previous characters in the string. Selectively expanding the image network is achieved by initially using upper bounds on the language model probabilities on the branches of an unexpanded image network. A best path search operation is then performed to determine an estimated best path through the image network using these upper bound scores. After decoding, only the nodes on the estimated best path are expanded with new nodes and with branches incoming to the new nodes that accommodate new language model scores reflecting actual character histories in place of the upper bound scores. Decoding and selectively expanding the image network are repeated until the final output transcription of the text image has been produced.

    摘要翻译: 文本识别系统将文档图像的解码消息表示为通过图像网络的路径。 将语言模型集成到网络中的方法选择性地扩展网络以适应网络中某些路径的语言模型,有效地管理存储器存储需求和将语言模型有效地集成到网络中的计算复杂性。 语言模型生成指定字符串中某个字符发生概率的概率分布,给定一个或多个字符串中的以前的字符。 通过开始使用未展开图像网络的分支上的语言模型概率的上限来实现选择性地扩展图像网络。 然后执行最佳路径搜索操作以通过使用这些上界得分来确定通过图像网络的估计最佳路径。 在解码之后,只有估计最佳路径上的节点才会用新节点扩展,并且分支进入新节点,以适应反映实际角色历史的新语言模型分数来代替上限分数。 重复解码并选择性地扩展图像网络,直到产生文本图像的最终输出转录。