Invention Application
- Patent Title: SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA
-
Application No.: US17484623Application Date: 2021-09-24
-
Publication No.: US20220366317A1Publication Date: 2022-11-17
- Inventor: Mingfei Gao , Zeyuan Chen , Ran Xu
- Applicant: salesforce.com, inc.
- Applicant Address: US CA San Francisco
- Assignee: salesforce.com, inc.
- Current Assignee: salesforce.com, inc.
- Current Assignee Address: US CA San Francisco
- Main IPC: G06N20/20
- IPC: G06N20/20 ; G06N5/00 ; G06N5/04

Abstract:
Embodiments described a field extraction system that does not require field-level annotations for training. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.
Information query