-
公开(公告)号:US20100076972A1
公开(公告)日:2010-03-25
申请号:US12344871
申请日:2008-12-29
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/278
摘要: The invention relates to cross-document entity co-reference systems in which naturally occurring entity mentions in a document corpus are analyzed and transformed into name clusters that represent global entities. In a first aspect of the invention, a name variation module analyzes naturally occurring names of entities extracted from the document corpus and provides an initial set of equivalent names that could refer to the same real world entity. In a second aspect of the invention, a disambiguation module takes the initial set of equivalent names and uses an agglomerative clustering algorithm to disambiguate the potentially co-referent named entities.
摘要翻译: 本发明涉及跨文档实体协同参考系统,其中文档语料库中的自然发生的实体提及被分析并转换成代表全局实体的名称簇。 在本发明的第一方面,名称变体模块分析从文档语料库中提取的实体的自然发生的名称,并且提供可引用相同的真实世界实体的初始的等效名称集合。 在本发明的第二方面,消歧模块采用初始的等效名称集合并且使用聚集聚类算法来消除潜在的共同指称的实体的歧义。
-
公开(公告)号:US08527522B2
公开(公告)日:2013-09-03
申请号:US12344871
申请日:2008-12-29
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/278
摘要: The invention relates to cross-document entity co-reference systems in which naturally occurring entity mentions in a document corpus are analyzed and transformed into name clusters that represent global entities. In a first aspect of the invention, a name variation module analyzes naturally occurring names of entities extracted from the document corpus and provides an initial set of equivalent names that could refer to the same real world entity. In a second aspect of the invention, a disambiguation module takes the initial set of equivalent names and uses an agglomerative clustering algorithm to disambiguate the potentially co-referent named entities.
摘要翻译: 本发明涉及跨文档实体协同参考系统,其中文档语料库中的自然发生的实体提及被分析并转换成代表全局实体的名称簇。 在本发明的第一方面,名称变体模块分析从文档语料库中提取的实体的自然发生的名称,并且提供可引用相同的真实世界实体的初始的等效名称集合。 在本发明的第二方面,消歧模块采用初始的等效名称集合并且使用聚集聚类算法来消除潜在的共同指称的实体的歧义。
-
公开(公告)号:US08131536B2
公开(公告)日:2012-03-06
申请号:US11998663
申请日:2007-11-30
申请人: Ralph M. Weischedel , Jinxi Xu , Michael R. Kayser
发明人: Ralph M. Weischedel , Jinxi Xu , Michael R. Kayser
IPC分类号: G06F17/28
CPC分类号: G06F17/289 , G06F17/2229 , G06F17/278
摘要: The invention relates to systems and methods for automatically translating documents from a first language to a second language. To carry out the translation of a document, elements of information are extracted from the document and are translated using one or more specialized translation processes. The remainder of the document is separately translated by a statistical translation process. The translated elements of information and the translated remainder are then merged into a final translated document.
摘要翻译: 本发明涉及用于将文档从第一语言自动翻译成第二语言的系统和方法。 为了执行文档的翻译,从文档中提取信息元素,并使用一个或多个专门的翻译过程进行翻译。 文档的其余部分由统计翻译程序单独翻译。 信息的翻译元素和翻译的剩余部分然后合并到最终翻译的文档中。
-
公开(公告)号:US08249856B2
公开(公告)日:2012-08-21
申请号:US12052555
申请日:2008-03-20
申请人: Libin Shen , Jinxi Xu , Ralph M. Weischedel
发明人: Libin Shen , Jinxi Xu , Ralph M. Weischedel
IPC分类号: G06F17/28
CPC分类号: G06F17/2872
摘要: A method for computer-assisted translation from a source language to a target language makes use of number of rules. Each rule forms an association between a representation of a sequence of source language tokens with a corresponding tree-based structure in the target language. The tree-based structure for each of at least some of the rules represents one or more asymmetrical relations within a number of target tokens associated with the tree-based structure and provides an association of the target tokens with the sequence of source language tokens of the rule. An input sequence of source tokens is decoded according to the rules to generate a representation of one or more output sequences of target language tokens. Decoding includes, for each of at least some sub-sequences of the input sequence of source tokens, determining a tree-based structure associated with the sub-sequence according a match to one of the plurality of rules.
摘要翻译: 从源语言到目标语言的计算机辅助翻译的方法利用规则的数量。 每个规则在源语言令牌序列的表示与目标语言中相应的基于树的结构之间形成关联。 至少一些规则中的每一个的基于树的结构表示与基于树的结构相关联的多个目标令牌内的一个或多个不对称关系,并且提供目标令牌与源语言令牌的序列的关联 规则。 根据规则对源令牌的输入序列进行解码,以生成目标语言令牌的一个或多个输出序列的表示。 对于源令牌的输入序列的至少一些子序列中的每一个,解码包括根据与多个规则之一匹配的与子序列相关联的基于树的结构。
-
公开(公告)号:US20090240487A1
公开(公告)日:2009-09-24
申请号:US12052555
申请日:2008-03-20
申请人: Libin Shen , Jinxi Xu , Ralph M. Weischedel
发明人: Libin Shen , Jinxi Xu , Ralph M. Weischedel
CPC分类号: G06F17/2872
摘要: A method for computer-assisted translation from a source language to a target language makes use of number of rules. Each rule forms an association between a representation of a sequence of source language tokens with a corresponding tree-based structure in the target language. The tree-based structure for each of at least some of the rules represents one or more asymmetrical relations within a number of target tokens associated with the tree-based structure and provides an association of the target tokens with the sequence of source language tokens of the rule. An input sequence of source tokens is decoded according to the rules to generate a representation of one or more output sequences of target language tokens. Decoding includes, for each of at least some sub-sequences of the input sequence of source tokens, determining a tree-based structure associated with the sub-sequence according a match to one of the plurality of rules.
摘要翻译: 从源语言到目标语言的计算机辅助翻译的方法利用规则的数量。 每个规则在源语言令牌序列的表示与目标语言中相应的基于树的结构之间形成关联。 至少一些规则中的每一个的基于树的结构表示与基于树的结构相关联的多个目标令牌内的一个或多个不对称关系,并且提供目标令牌与源语言令牌的序列的关联 规则。 根据规则对源令牌的输入序列进行解码,以生成目标语言令牌的一个或多个输出序列的表示。 对于源令牌的输入序列的至少一些子序列中的每一个,解码包括根据与多个规则之一匹配的与子序列相关联的基于树的结构。
-
公开(公告)号:US20080215309A1
公开(公告)日:2008-09-04
申请号:US11998663
申请日:2007-11-30
申请人: Ralph M. Weischedel , Jinxi Xu , Michael R. Kayser
发明人: Ralph M. Weischedel , Jinxi Xu , Michael R. Kayser
IPC分类号: G06F17/28
CPC分类号: G06F17/289 , G06F17/2229 , G06F17/278
摘要: The invention relates to systems and methods for automatically translating documents from a first language to a second language. To carry out the translation of a document, elements of information are extracted from the document and are translated using one or more specialized translation processes. The remainder of the document is separately translated by a statistical translation process. The translated elements of information and the translated remainder are then merged into a final translated document.
摘要翻译: 本发明涉及用于将文档从第一语言自动翻译成第二语言的系统和方法。 为了执行文档的翻译,从文档中提取信息元素,并使用一个或多个专门的翻译过程进行翻译。 文档的其余部分由统计翻译程序单独翻译。 信息的翻译元素和翻译的剩余部分然后合并到最终翻译的文档中。
-
-
-
-
-