-
公开(公告)号:US08260817B2
公开(公告)日:2012-09-04
申请号:US13012225
申请日:2011-01-24
CPC分类号: G06F17/30705 , G06F17/30675
摘要: The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.
摘要翻译: 本发明涉及主题分类系统,其中文本间隔被表示为命题树。 自由文本查询和候选响应被转换成命题树,并且特定候选响应可以通过将自由文本查询的命题树转换为候选响应的命题树来匹配自由文本查询。 因为命题树能够捕获文本间隔的语义信息,所以主题分类系统解释了主题词的相对重要性,释义和重新命名以及遗漏和添加。 还可以识别两个文本间隔的冗余。
-
公开(公告)号:US20110153673A1
公开(公告)日:2011-06-23
申请号:US13012225
申请日:2011-01-24
CPC分类号: G06F17/30705 , G06F17/30675
摘要: The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.
摘要翻译: 本发明涉及主题分类系统,其中文本间隔被表示为命题树。 自由文本查询和候选响应被转换成命题树,并且特定候选响应可以通过将自由文本查询的命题树转换为候选响应的命题树来匹配自由文本查询。 因为命题树能够捕获文本间隔的语义信息,所以主题分类系统解释了主题词的相对重要性,释义和重新命名以及遗漏和添加。 还可以识别两个文本间隔的冗余。
-
公开(公告)号:US07890539B2
公开(公告)日:2011-02-15
申请号:US11974022
申请日:2007-10-10
IPC分类号: G06F7/02
CPC分类号: G06F17/30705 , G06F17/30675
摘要: The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.
摘要翻译: 本发明涉及主题分类系统,其中文本间隔被表示为命题树。 自由文本查询和候选响应被转换成命题树,并且特定候选响应可以通过将自由文本查询的命题树转换为候选响应的命题树来匹配自由文本查询。 因为命题树能够捕获文本间隔的语义信息,所以主题分类系统解释了主题词的相对重要性,释义和重新命名以及遗漏和添加。 还可以识别两个文本间隔的冗余。
-
公开(公告)号:US08527522B2
公开(公告)日:2013-09-03
申请号:US12344871
申请日:2008-12-29
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/278
摘要: The invention relates to cross-document entity co-reference systems in which naturally occurring entity mentions in a document corpus are analyzed and transformed into name clusters that represent global entities. In a first aspect of the invention, a name variation module analyzes naturally occurring names of entities extracted from the document corpus and provides an initial set of equivalent names that could refer to the same real world entity. In a second aspect of the invention, a disambiguation module takes the initial set of equivalent names and uses an agglomerative clustering algorithm to disambiguate the potentially co-referent named entities.
摘要翻译: 本发明涉及跨文档实体协同参考系统,其中文档语料库中的自然发生的实体提及被分析并转换成代表全局实体的名称簇。 在本发明的第一方面,名称变体模块分析从文档语料库中提取的实体的自然发生的名称,并且提供可引用相同的真实世界实体的初始的等效名称集合。 在本发明的第二方面,消歧模块采用初始的等效名称集合并且使用聚集聚类算法来消除潜在的共同指称的实体的歧义。
-
公开(公告)号:US20100076972A1
公开(公告)日:2010-03-25
申请号:US12344871
申请日:2008-12-29
IPC分类号: G06F17/30
CPC分类号: G06F17/3071 , G06F17/278
摘要: The invention relates to cross-document entity co-reference systems in which naturally occurring entity mentions in a document corpus are analyzed and transformed into name clusters that represent global entities. In a first aspect of the invention, a name variation module analyzes naturally occurring names of entities extracted from the document corpus and provides an initial set of equivalent names that could refer to the same real world entity. In a second aspect of the invention, a disambiguation module takes the initial set of equivalent names and uses an agglomerative clustering algorithm to disambiguate the potentially co-referent named entities.
摘要翻译: 本发明涉及跨文档实体协同参考系统,其中文档语料库中的自然发生的实体提及被分析并转换成代表全局实体的名称簇。 在本发明的第一方面,名称变体模块分析从文档语料库中提取的实体的自然发生的名称,并且提供可引用相同的真实世界实体的初始的等效名称集合。 在本发明的第二方面,消歧模块采用初始的等效名称集合并且使用聚集聚类算法来消除潜在的共同指称的实体的歧义。
-
公开(公告)号:US20090100053A1
公开(公告)日:2009-04-16
申请号:US11974022
申请日:2007-10-10
IPC分类号: G06F7/02
CPC分类号: G06F17/30705 , G06F17/30675
摘要: The invention relates to topic classification systems in which text intervals are represented as proposition trees. Free-text queries and candidate responses are transformed into proposition trees, and a particular candidate response can be matched to a free-text query by transforming the proposition trees of the free-text query into the proposition trees of the candidate responses. Because proposition trees are able to capture semantic information of text intervals, the topic classification system accounts for the relative importance of topic words, for paraphrases and re-wordings, and for omissions and additions. Redundancy of two text intervals can also be identified.
摘要翻译: 本发明涉及主题分类系统,其中文本间隔被表示为命题树。 自由文本查询和候选响应被转换成命题树,并且特定候选响应可以通过将自由文本查询的命题树转换为候选响应的命题树来匹配自由文本查询。 因为命题树能够捕获文本间隔的语义信息,所以主题分类系统解释了主题词的相对重要性,释义和重新命名以及遗漏和添加。 还可以识别两个文本间隔的冗余。
-
-
-
-
-