Distinguishing facts from opinions using a multi-stage approach
    1.
    发明授权
    Distinguishing facts from opinions using a multi-stage approach 有权
    使用多阶段方式区分事实与意见

    公开(公告)号:US07668791B2

    公开(公告)日:2010-02-23

    申请号:US11496650

    申请日:2006-07-31

    IPC分类号: G06N5/00

    CPC分类号: G06F17/2785 G06F17/30719

    摘要: Facts are extracted from electronic documents by recognizing factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighborhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.

    摘要翻译: 通过使用事实单词表识别事实描述来匹配电子文档的单词,从电子文档中提取事实。 这些事实描述的话可能会被适当的部分标注。 然后对这些事实描述进行更详细的分析,而不是整个电子文档,特别是关于事实词匹配附近的文本。 分析可能涉及确定每个短语的语言成分,并确定作为主体或对象的角色。 排除规则可以用于消除那些不太可能成为事实的一部分,排除规则部分地基于语言成分。 评分规则可以应用于剩余短语,并且对于具有超过阈值的分数的那些短语,相应的句子部分,整个句子,段落或其他文档部分可以被呈现为表示一个或多个事实。

    Optimization of fact extraction using a multi-stage approach
    2.
    发明申请
    Optimization of fact extraction using a multi-stage approach 有权
    使用多阶段方法优化事实提取

    公开(公告)号:US20080027888A1

    公开(公告)日:2008-01-31

    申请号:US11496650

    申请日:2006-07-31

    IPC分类号: G06N5/00

    CPC分类号: G06F17/2785 G06F17/30719

    摘要: Facts are extracted from electronic documents by recognizing factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighborhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.

    摘要翻译: 通过使用事实单词表识别事实描述来匹配电子文档的单词,从电子文档中提取事实。 这些事实描述的话可能会被适当的部分标注。 然后对这些事实描述进行更详细的分析,而不是整个电子文档,特别是关于事实词匹配附近的文本。 分析可能涉及确定每个短语的语言成分,并确定作为主体或对象的角色。 排除规则可以用于消除那些不太可能成为事实的一部分,排除规则部分地基于语言成分。 评分规则可以应用于剩余短语,并且对于具有超过阈值的分数的那些短语,相应的句子部分,整个句子,段落或其他文档部分可以被呈现为表示一个或多个事实。