Hierarchical clustering with real-time updating
    1.
    发明申请
    Hierarchical clustering with real-time updating 有权
    分层聚类与实时更新

    公开(公告)号:US20070239745A1

    公开(公告)日:2007-10-11

    申请号:US11391864

    申请日:2006-03-29

    IPC分类号: G06F7/00

    摘要: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.

    摘要翻译: 概率聚类系统至少​​部分地由指示表征群集系统的类的字数,比率或频率的概率模型参数定义。 概率聚类系统中的一个或多个文档的关联从一个或多个源类改变为一个或多个目的地类。 表征受改变的关联影响的类的概率模型参数是本地更新的,而不更新表征不受改变的关联影响的类的概率模型参数。

    Hierarchical clustering with real-time updating
    2.
    发明授权
    Hierarchical clustering with real-time updating 有权
    分层聚类与实时更新

    公开(公告)号:US07720848B2

    公开(公告)日:2010-05-18

    申请号:US11391864

    申请日:2006-03-29

    IPC分类号: G06F7/00 G06F17/00

    摘要: A probabilistic clustering system is defined at least in part by probabilistic model parameters indicative of word counts, ratios, or frequencies characterizing classes of the clustering system. An association of one or more documents in the probabilistic clustering system is changed from one or more source classes to one or more destination classes. Probabilistic model parameters characterizing classes affected by the changed association are locally updated without updating probabilistic model parameters characterizing classes not affected by the changed association.

    摘要翻译: 概率聚类系统至少​​部分地由指示表征群集系统的类的字数,比率或频率的概率模型参数定义。 概率聚类系统中的一个或多个文档的关联从一个或多个源类改变为一个或多个目的地类。 表征受改变的关联影响的类的概率模型参数在本地更新,而不更新表征不受改变的关联影响的类的概率模型参数。

    Categorization including dependencies between different category systems
    3.
    发明授权
    Categorization including dependencies between different category systems 有权
    分类包括不同类别系统之间的依赖关系

    公开(公告)号:US07630977B2

    公开(公告)日:2009-12-08

    申请号:US11170033

    申请日:2005-06-29

    IPC分类号: G06F17/30

    摘要: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.

    摘要翻译: 在对对应于由多个类别定义的至少两个分类维度的对象进行分类中,针对每个分类维度的每个类别确定指示对象的概率值。 基于(i)确定的分类维度的类别的概率值和(ii)所确定的至少两个分类中的至少另一个的类别的概率值,对于每个分类维度分别选择对象的分类标签 尺寸。

    Categorization including dependencies between different category systems
    4.
    发明申请
    Categorization including dependencies between different category systems 有权
    分类包括不同类别系统之间的依赖关系

    公开(公告)号:US20070005639A1

    公开(公告)日:2007-01-04

    申请号:US11170033

    申请日:2005-06-29

    IPC分类号: G06F7/00

    摘要: In categorizing an object respective to at least two categorization dimensions each defined by a plurality of categories, a probability value indicative of the object is determined for each category of each categorization dimension. A categorization label for the object is selected respective to each categorization dimension based on (i) the determined probability values of the categories of that categorization dimension and (ii) the determined probability values of categories of at least one other of the at least two categorization dimensions.

    摘要翻译: 在对对应于由多个类别定义的至少两个分类维度的对象进行分类中,针对每个分类维度的每个类别确定指示对象的概率值。 基于(i)确定的分类维度的类别的概率值和(ii)所确定的至少两个分类中的至少另一个的类别的概率值,对于每个分类维度分别选择对象的分类标签 尺寸。

    Incremental training for probabilistic categorizer
    7.
    发明申请
    Incremental training for probabilistic categorizer 有权
    概率分类器的增量训练

    公开(公告)号:US20070005340A1

    公开(公告)日:2007-01-04

    申请号:US11170019

    申请日:2005-06-29

    IPC分类号: G06F17/27

    CPC分类号: G06F17/277 G06F17/3071

    摘要: A probabilistic document categorizer has an associated vocabulary of words and an associated plurality of probabilistic categorizer parameters derived from a collection of documents. A new document is received. The probabilistic categorizer parameters are updated to reflect addition of the new document to the collection of documents based on vocabulary words contained in the new document, a category of the new document, and a collection size parameter indicative of an effective total number of instances of vocabulary words in the collection of documents.

    摘要翻译: 概率文档分类器具有从文档集合导出的词的相关词汇和相关联的多个概率分类器参数。 收到一份新的文件。 更新概率分类器参数以反映新文档的添加,基于新文档中包含的词汇单,新文档的类别以及指示词汇的有效实例总数的集合大小参数来收集文档 在收集文件中的单词。

    Apparatus and methods for aligning words in bilingual sentences
    8.
    发明申请
    Apparatus and methods for aligning words in bilingual sentences 失效
    双语句子对齐词的装置和方法

    公开(公告)号:US20060190241A1

    公开(公告)日:2006-08-24

    申请号:US11137590

    申请日:2005-05-26

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2827

    摘要: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.

    摘要翻译: 公开了用于执行满足覆盖和传递闭包约束的适当字对齐的方法。 最初,计算了定义源语句和目标语句双语翻译语料库的源词和目标词之间的词关联度量的翻译矩阵。 随后,在第一种方法中,翻译矩阵中的关联度量被分解和正交化以产生源词和目标词的尖叫,所得到的矩阵因子然后可以被乘以以产生对齐矩阵。 在第二种方法中,翻译矩阵中的关联度量被阈值化,然后由传递性闭合,以产生对准矩阵,其可以随后被分解以产生尖叫。 所得到的尖叫或对齐矩阵然后可以被任何数量的自然语言应用程序用于识别正确对准的单词。

    Apparatus and methods for aligning words in bilingual sentences
    9.
    发明授权
    Apparatus and methods for aligning words in bilingual sentences 失效
    双语句子对齐词的装置和方法

    公开(公告)号:US07672830B2

    公开(公告)日:2010-03-02

    申请号:US11137590

    申请日:2005-05-26

    IPC分类号: G06F17/28 G06F17/27

    CPC分类号: G06F17/2827

    摘要: Methods are disclosed for performing proper word alignment that satisfy constraints of coverage and transitive closure. Initially, a translation matrix which defines word association measures between source and target words of a corpus of bilingual translations of source and target sentences is computed. Subsequently, in a first method, the association measures in the translation matrix are factorized and orthogonalized to produce cepts for the source and target words, which resulting matrix factors may then be, optionally, multiplied to produce an alignment matrix. In a second method, the association measures in the translation matrix are thresholded, and then closed by transitivity, to produce an alignment matrix, which may then be, optionally, factorized to produce cepts. The resulting cepts or alignment matrices may then be used by any number of natural language applications for identifying words that are properly aligned.

    摘要翻译: 公开了用于执行满足覆盖和传递闭包约束的适当字对齐的方法。 最初,计算了定义源语句和目标语句双语翻译语料库的源词和目标词之间的词关联度量的翻译矩阵。 随后,在第一种方法中,翻译矩阵中的关联度量被分解和正交化以产生源词和目标词的尖叫,所得到的矩阵因子然后可以被乘以以产生对齐矩阵。 在第二种方法中,翻译矩阵中的关联度量被阈值化,然后由传递性闭合,以产生对准矩阵,其可以随后被分解以产生尖叫。 所得到的尖叫或对齐矩阵然后可以被任何数量的自然语言应用程序用于识别正确对准的单词。

    Machine translation using elastic chunks
    10.
    发明授权
    Machine translation using elastic chunks 失效
    机械翻译使用弹性块

    公开(公告)号:US07542893B2

    公开(公告)日:2009-06-02

    申请号:US11431393

    申请日:2006-05-10

    IPC分类号: G06F17/28

    CPC分类号: G06F17/2818

    摘要: A machine translation method includes receiving source text in a first language and retrieving text fragments in a target language from a library of bi-fragments to generate a target hypothesis. Each bi-fragment includes a text fragment from the first language and a corresponding text fragment from the second language. Some of the bi-fragments are modeled as elastic bi-fragments where a gap between words is able to assume a variable size corresponding to a number of other words to occupy the gap. The target hypothesis is evaluated with a translation scoring function which scores the target hypothesis according to a plurality of feature functions, at least one of the feature functions comprising a gap size scoring feature which favors hypotheses with statistically more probable gap sizes over hypotheses with statically less probable gap sizes.

    摘要翻译: 机器翻译方法包括以第一语言接收源文本并且从双片段的库中检索目标语言中的文本片段以生成目标假设。 每个双片段包括来自第一语言的文本片段和来自第二语言的相应文本片段。 一些双片段被建模为弹性双片段,其中词之间的间隙能够采用与多个其他单词相对应的可变大小来占据间隙。 目标假设用翻译评分函数评估,其根据多个特征函数对目标假设进行评分,特征函数中的至少一个包括间隙大小评分特征,其有利于具有统计学上更可能的间隔大小超过假设的假设,具有静态较小 可能的间隙大小。