Systems and methods for field extraction from unlabeled data

    公开(公告)号:US12086698B2

    公开(公告)日:2024-09-10

    申请号:US17484618

    申请日:2021-09-24

    Abstract: A field extraction system that does not require field-level annotations for training is provided. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

    Systems and methods for online adaptation for cross-domain streaming data

    公开(公告)号:US12235850B2

    公开(公告)日:2025-02-25

    申请号:US17588022

    申请日:2022-01-28

    Abstract: Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent learners are trained to preserve the differences. The knowledge of the learners is then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.

    SYSTEMS AND METHODS FOR A DISTRIBUTED TRAINING FRAMEWORK USING UNIFORM CLASS PROTOTYPES

    公开(公告)号:US20240054350A1

    公开(公告)日:2024-02-15

    申请号:US18064122

    申请日:2022-12-09

    CPC classification number: G06N3/098

    Abstract: Embodiments described herein provide systems and methods for federated learning. A central system may store a neural network model which has a body of a number of layers, and a classification layer comprising class prototypes which classifies the latent representations output by the body of the model. The central system may initialize the class prototypes so that they are uniformly distributed in the representation space. The model and class prototypes may be broadcast to a number of client systems, which update the body of the model locally while keeping the class prototypes fixed. The clients may return information to the central system including updated local model parameters, and a local representation of the classes based on the latent representation of items in the local training data. Based on the information from the clients, the neural network model may be updated. This process may be repeated iteratively.

Patent Agency Ranking