-
1.
公开(公告)号:US20070162447A1
公开(公告)日:2007-07-12
申请号:US11321177
申请日:2005-12-29
申请人: Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes
发明人: Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott Holmes
IPC分类号: G06F7/00
CPC分类号: G06F17/30864 , G06F17/30705
摘要: A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognise factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.
摘要翻译: 公开了一种从文本存储库中提取事实框架的方法(400),其中事实框架与给定的类别类别相关联。 方法(400)通过训练分类器(230)开始,以识别与该给定的类别类别相关的因子。 接下来从文本存储库收集与文件类型相关的文档或文档摘要(410)。 具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取(420)。 这些句子在嘈杂的环境中被分类(440),使用分类器(230)提取包含与给定类别类别相关的短语的片段。 提取的片段是与给定类实体类别相关联的实例。
-
2.
公开(公告)号:US08706730B2
公开(公告)日:2014-04-22
申请号:US11321177
申请日:2005-12-29
申请人: Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott R Holmes
发明人: Sachindra Joshi , Raghuram Krishnapuram , Nimit Kumar , Kiran Mehta , Sumit Negi , Ganesh Ramakrishnan , Scott R Holmes
IPC分类号: G06F17/30
CPC分类号: G06F17/30864 , G06F17/30705
摘要: A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognize factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.
摘要翻译: 公开了一种从文本存储库中提取事实框架的方法(400),其中事实框架与给定的类别类别相关联。 方法(400)通过训练分类器(230)开始,以识别与该给定的类别类别相关的因子。 接下来从文本存储库收集与文件类型相关的文档或文档摘要(410)。 具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取(420)。 这些句子在嘈杂的环境中被分类(440),使用分类器(230)提取包含与给定类别类别相关的短语的片段。 提取的片段是与给定类实体类别相关联的实例。
-