Invention Publication
- Patent Title: PRE-TRAINING TECHNIQUES FOR ENTITY EXTRACTION IN LOW RESOURCE DOMAINS
-
Application No.: US17525311Application Date: 2021-11-12
-
Publication No.: US20230153533A1Publication Date: 2023-05-18
- Inventor: Aniruddha Mahapatra , Sharmila Reddy Nangi , Aparna Garimella , Anandha velu Natarajan
- Applicant: ADOBE INC.
- Applicant Address: US CA SAN JOSE
- Assignee: ADOBE INC.
- Current Assignee: ADOBE INC.
- Current Assignee Address: US CA SAN JOSE
- Main IPC: G06F40/289
- IPC: G06F40/289 ; G06F40/211 ; G06F40/42

Abstract:
Embodiments of the present invention provide systems, methods, and computer storage media for pre-training entity extraction models to facilitate domain adaptation in resource-constrained domains. In an example embodiment, a first machine learning model is used to encode sentences of a source domain corpus and a target domain corpus into sentence embeddings. The sentence embeddings of the target domain corpus are combined into a target corpus embedding. Training sentences from the source domain corpus within a threshold of similarity to the target corpus embedding are selected. A second machine learning model is trained on the training sentences selected from the source domain corpus.
Public/Granted literature
- US12159109B2 Pre-training techniques for entity extraction in low resource domains Public/Granted day:2024-12-03
Information query