Information extraction from question and answer websites

    公开(公告)号:US09875296B2

    公开(公告)日:2018-01-23

    申请号:US14667792

    申请日:2015-03-25

    Applicant: Google Inc.

    CPC classification number: G06F17/3064 G06F17/2705 G06F17/2785

    Abstract: Methods, systems, and apparatus for obtaining a resource, identifying a first portion of text of the resource that is characterized as a question, and a second part of text of the resource that is characterized as an answer to the question, identifying an entity that is referenced by one or more terms of the text that is characterized as the question, a relationship type that is referenced by one or more other terms of the text that is characterized as the question, and an entity that is referenced by the text that is characterized as the answer to the question, and adjusting a score for a relationship of the relationship type for the entity that is referenced by the one or more terms of the text that is characterized as the question and the entity that is referenced by the text that is characterized as the answer to the question.

    Distantly supervised wrapper induction for semi-structured documents

    公开(公告)号:US10977573B1

    公开(公告)日:2021-04-13

    申请号:US15130089

    申请日:2016-04-15

    Applicant: Google Inc.

    Abstract: Systems and methods provide distantly supervised wrapper induction for semi-structured documents, including automatically generating and annotating training documents for the wrapper. Training of the wrapper may occur in two phases using the training documents. An example method includes identifying a training set of semi-structured web pages having a subject entity that exists in a knowledge base and, for each training page, identifying target objects, identifying predicates in the knowledge base that connect the subject entity to a target objects identified in the training page, and annotating the training page. Annotating a training page includes generating a feature set for a mention of the target object, generating predicate-target object pairs for the mention, and labeling each predicate-target object pair with a corresponding example type and weight. The annotated training pages are used to train the wrapper to extract new subject entities and new facts from the set of semi-structured web pages.

    Information Extraction from Question And Answer Websites
    4.
    发明申请
    Information Extraction from Question And Answer Websites 有权
    从问答网站提取信息

    公开(公告)号:US20160283491A1

    公开(公告)日:2016-09-29

    申请号:US14667792

    申请日:2015-03-25

    Applicant: Google Inc.

    CPC classification number: G06F17/3064 G06F17/2705 G06F17/2785

    Abstract: Methods, systems, and apparatus for obtaining a resource, identifying a first portion of text of the resource that is characterized as a question, and a second part of text of the resource that is characterized as an answer to the question, identifying an entity that is referenced by one or more terms of the text that is characterized as the question, a relationship type that is referenced by one or more other terms of the text that is characterized as the question, and an entity that is referenced by the text that is characterized as the answer to the question, and adjusting a score for a relationship of the relationship type for the entity that is referenced by the one or more terms of the text that is characterized as the question and the entity that is referenced by the text that is characterized as the answer to the question.

    Abstract translation: 用于获取资源的方法,系统和装置,识别被表征为问题的资源的文本的第一部分以及被描述为该问题的答案的资源的文本的第二部分, 由文本的一个或多个术语引用,其被表征为该问题,被描述为问题的文本的一个或多个其他术语引用的关系类型以及由文本引用的实体 其特征在于该问题的答案,并且调整由特征为问题的文本的一个或多个术语引用的实体的关系类型的关系的得分和由文本引用的文本 被认为是问题的答案。

Patent Agency Ranking