COMPACT SIGNATURE FOR UNORDERED VECTOR SETS WITH APPLICATION TO IMAGE RETRIEVAL
    1.
    发明申请
    COMPACT SIGNATURE FOR UNORDERED VECTOR SETS WITH APPLICATION TO IMAGE RETRIEVAL 有权
    用于图像检索应用的无符号矢量集的紧凑签名

    公开(公告)号:US20110026831A1

    公开(公告)日:2011-02-03

    申请号:US12512209

    申请日:2009-07-30

    IPC分类号: G06K9/48 G06F17/30 G06F7/10

    摘要: To compute a signature for an object comprising or represented by a set of vectors in a vector space of dimensionality D, statistics are computed that are indicative of distribution of the vectors of the set of vectors amongst a set of regions Ri, i=1, . . . , N of the vector space, at least some statistics associated with each region are binarized to generate sets of binary values ai, i=1, . . . , N indicative of statistics of the vectors of the set of vectors belonging to the respective regions Ri, i=1, . . . , N; and a vector set signature is defined for the set of vectors including the sets of binary values ai, i=1, . . . , N. The computing, binarizing, and defining operations may be repeated for two sets of vectors, and a quantitative comparison of the two sets of vectors determined based on the corresponding vector set signatures.

    摘要翻译: 为了计算包括或由维度D的向量空间中的一组向量表示的对象的签名,计算指示一组区域Ri,i = 1之间的向量集合的向量的分布的统计, 。 。 。 ,N的向量空间,至少与每个区域相关联的一些统计量被二值化以生成二进制值集合a i,i = 1。 。 。 ,N表示属于各个区域Ri,i = 1的矢量组的矢量的统计。 。 。 ,N; 并且为包括二进制值ai,i = 1的集合的向量集合定义向量集签名。 。 。 可以针对两组向量重复计算,二值化和定义操作,以及基于相应向量集签名确定的两组向量的定量比较。

    Decision criteria for automated form population
    2.
    发明申请
    Decision criteria for automated form population 有权
    自动化人口群体的决策标准

    公开(公告)号:US20080267505A1

    公开(公告)日:2008-10-30

    申请号:US11789823

    申请日:2007-04-26

    摘要: A method is provided for selecting fields of an electronic form for automatic population with candidate text segments. The candidate text segments can be obtained by capturing an image of a document, applying optical character recognition to the captured image to identify textual content, and tagging candidate text segments in the textual content for fields of the form. The method includes, for each of a plurality of fields of the form, computing a field exclusion function based on at least one parameter selected from a text length parameter, an optical character recognition error rate, a tagging error rate, and a field relevance parameter; and determining whether to select the field for automatic population based on the computed field exclusion function.

    摘要翻译: 提供一种用于选择具有候选文本段的自动人口的电子表格的字段的方法。 候选文本段可以通过捕获文档的图像,对所捕获的图像应用光学字符识别以识别文本内容并且在表单的字段的文本内容中标记候选文本段来获得。 该方法包括对于表格的多个字段中的每一个,基于从文本长度参数,光学字符识别错误率,标记错误率和字段相关性参数中选择的至少一个参数来计算字段排除功能 ; 以及基于所计算的场排除函数来确定是否选择用于自动人口的字段。

    Compact signature for unordered vector sets with application to image retrieval
    3.
    发明授权
    Compact signature for unordered vector sets with application to image retrieval 有权
    用于无序向量集的紧凑签名,应用于图像检索

    公开(公告)号:US08644622B2

    公开(公告)日:2014-02-04

    申请号:US12512209

    申请日:2009-07-30

    摘要: To compute a signature for an object comprising or represented by a set of vectors in a vector space of dimensionality D, statistics are computed that are indicative of distribution of the vectors of the set of vectors amongst a set of regions Ri, i=1, . . . , N of the vector space, at least some statistics associated with each region are binarized to generate sets of binary values ai, i=1, . . . , N indicative of statistics of the vectors of the set of vectors belonging to the respective regions Ri, i=1, . . . , N; and a vector set signature is defined for the set of vectors including the sets of binary values ai, i=1, . . . , N. The computing, binarizing, and defining operations may be repeated for two sets of vectors, and a quantitative comparison of the two sets of vectors determined based on the corresponding vector set signatures.

    摘要翻译: 为了计算包括或由维度D的向量空间中的一组向量表示的对象的签名,计算指示一组区域Ri,i = 1之间的向量集合的向量的分布的统计, 。 。 。 ,N的向量空间,至少与每个区域相关联的一些统计量被二值化以生成二进制值集合a i,i = 1。 。 。 ,N表示属于各个区域Ri,i = 1的矢量组的矢量的统计。 。 。 ,N; 并且为包括二进制值ai,i = 1的集合的向量集合定义向量集签名。 。 。 可以针对两组向量重复计算,二值化和定义操作,以及基于相应向量集签名确定的两组向量的定量比较。

    Method and apparatus for recognizing multiword expressions
    4.
    发明授权
    Method and apparatus for recognizing multiword expressions 有权
    用于识别多字表达的方法和装置

    公开(公告)号:US07346511B2

    公开(公告)日:2008-03-18

    申请号:US10248057

    申请日:2002-12-13

    IPC分类号: G10L15/04 G06F17/27

    CPC分类号: G06F17/271 G06F17/2755

    摘要: Words of an input string are morphologically analyzed to identify their alternative base forms and parts of speech. The analyzed words of the input string are used to compile the input string into a first finite-state network. The first finite-state network is matched with a second finite-state network of multiword expressions to identify all subpaths of the first finite-state network that match one or more complete paths in the second finite-state network. Each matching subpath of the first finite-state network and path of the second finite-state network identify a multiword expression in the input string. The morphological analysis is performed without disambiguating words and without segmenting the input string into sentences in the input string to compile the first finite-state network with at least one path that identifies alternative base forms or parts of speech of a word in the input string.

    摘要翻译: 输入字符串的词在形态上进行分析,以确定其替代基本形式和词性。 输入字符串的分析词用于将输入字符串编译成第一个有限状态网络。 第一有限状态网络与多字表达式的第二有限状态网络匹配,以识别与第二有限状态网络中的一个或多个完整路径匹配的第一有限状态网络的所有子路径。 第一有限状态网络的每个匹配子路径和第二有限状态网络的路径在输入字符串中标识多字表达式。 执行形态分析而不消除词汇,而不将输入字符串分割成输入字符串中的句子,以用至少一个路径识别第一有限状态网络,该路径识别输入字符串中单词的替代基本形式或词性。

    Method and apparatus for mapping multiword expressions to identifiers using finite-state networks
    5.
    发明授权
    Method and apparatus for mapping multiword expressions to identifiers using finite-state networks 有权
    使用有限状态网络将多字表达式映射到标识符的方法和装置

    公开(公告)号:US07552051B2

    公开(公告)日:2009-06-23

    申请号:US10248058

    申请日:2002-12-13

    IPC分类号: G10L15/04 G06F17/27

    CPC分类号: G06F17/2775

    摘要: Multiword expressions are mapped to identifiers using finite-state networks. Each of a plurality of multiword expressions is encoded into a regular expression. Each regular expression encodes a base form common to a plurality of derivative forms defined by ones of the multiword expressions. Each of the plurality of regular expressions is compiled with factorization into a set of finite-state networks. A union of the finite-state networks in the set of finite-state networks is performed to define a multiword finite-state network and a set of subnets. The multiword finite-state network and the set of subnets are traversed to identify a path corresponding to one of the plurality of multiword expressions, wherein only transitions originating from the multiword finite-state network are accounted for to ascertain a path number identifying a base form of the one of the plurality of multiword expressions.

    摘要翻译: 使用有限状态网络将多字表达式映射到标识符。 多个多词表达式中的每一个被编码成正则表达式。 每个正则表达式编码由多个词表达式中的一个定义的多个导数形式共同的基本形式。 多个正则表达式中的每一个被分解成一组有限状态网络。 执行有限状态网络集合中的有限状态网络的并集,以定义多字有限状态网络和一组子网。 遍历多字有限状态网络和子集合以识别与多个多词表达式中的一个对应的路径,其中仅考虑源自多字有限状态网络的转换以确定识别基本形式的路径号 的多个多词表达中的一个。

    Executable for requesting a linguistic service
    6.
    发明授权
    Executable for requesting a linguistic service 有权
    可执行请求语言服务

    公开(公告)号:US06321372B1

    公开(公告)日:2001-11-20

    申请号:US09221232

    申请日:1998-12-23

    IPC分类号: G06F945

    CPC分类号: G06F8/30 G06F17/289

    摘要: An executable for a new linguistic service is produced using preexisting source code for an ancestor service that is a less specified ancestor of the new linguistic service in a hierarchy. The preexisting source code is modified, such as by further specifying it, to produce modified source code for responding to requests for the new linguistic service, where each request identifies the new linguistic service and indicates linguistic data on which it is to be performed. The modified source code is then used to produce the executable for the new linguistic service. The preexisting source code can, for example, define a top-level class in an object-oriented programming language, with common parameters including input parameters with information for obtaining the linguistic data and result parameters with information for returning results of the new linguistic service.

    摘要翻译: 用于新语言服务的可执行文件是使用先前存在的源代码生成的,这个祖先服务是层次结构中新语言服务的较少指定的祖先。 预先存在的源代码被修改,例如通过进一步指定它来产生用于响应新语言服务的请求的修改的源代码,其中每个请求标识新的语言服务并指示要在其上执行的语言数据。 然后,修改的源代码用于生成新语言服务的可执行文件。 例如,预先存在的源代码可以定义面向对象编程语言中的顶级类,其中包含具有用于获得语言数据和结果参数的信息的输入参数的公共参数,以及用于返回新语言服务结果的信息。

    Method and computer system for part-of-speech tagging of incomplete sentences
    7.
    发明授权
    Method and computer system for part-of-speech tagging of incomplete sentences 失效
    不完整句子的词性标注方法和计算机系统

    公开(公告)号:US06910004B2

    公开(公告)日:2005-06-21

    申请号:US09738987

    申请日:2000-12-19

    IPC分类号: G06F17/28 G06F17/27

    摘要: The invention relates to a method and a computer system for enhanced part-of-speech (POS-) tagging as well as grammatically disambiguating a phrase. A phrase is usually a short multiword expression that may be ambiguous. By introducing grammatical constraints the invention supports POS-tagging as well as grammatically disambiguating the phrase. According to an identifier for the phrase, the phrase is supplemented with artificial context information. The supplemented phrase is then POS-tagged or grammatically disambiguated. Important applications are POS-tagging, Automatic Term Encoding, Headword Detection and Information Retrieval.

    摘要翻译: 本发明涉及一种用于增强词性(POS-)标签以及语法上消除短语的方法和计算机系统。 短语通常是一个短的多字表达,可能是含糊的。 通过引入语法限制,本发明支持POS标记以及语法上消除歧义。 根据该短语的标识符,短语补充有人工上下文信息。 补充的短语然后是POS标记或语法消歧。 重要的应用是POS标记,自动术语编码,词法检测和信息检索。