Invention Grant
- Patent Title: Distantly supervised wrapper induction for semi-structured documents
-
Application No.: US15130089Application Date: 2016-04-15
-
Publication No.: US10977573B1Publication Date: 2021-04-13
- Inventor: Jeffrey Dalton , Karthik Raman , Evgeniy Gabrilovich , Kevin Patrick Murphy , Wei Zhang
- Applicant: Google Inc.
- Applicant Address: US CA Mountain View
- Assignee: Google Inc.
- Current Assignee: Google Inc.
- Current Assignee Address: US CA Mountain View
- Agency: Brake Hughes Bellermann LLP
- Main IPC: G06N20/00
- IPC: G06N20/00 ; G06F16/80 ; G06F40/169

Abstract:
Systems and methods provide distantly supervised wrapper induction for semi-structured documents, including automatically generating and annotating training documents for the wrapper. Training of the wrapper may occur in two phases using the training documents. An example method includes identifying a training set of semi-structured web pages having a subject entity that exists in a knowledge base and, for each training page, identifying target objects, identifying predicates in the knowledge base that connect the subject entity to a target objects identified in the training page, and annotating the training page. Annotating a training page includes generating a feature set for a mention of the target object, generating predicate-target object pairs for the mention, and labeling each predicate-target object pair with a corresponding example type and weight. The annotated training pages are used to train the wrapper to extract new subject entities and new facts from the set of semi-structured web pages.
Information query