Techniques for generating a training data set for a machine-learning model

    公开(公告)号:US12299581B1

    公开(公告)日:2025-05-13

    申请号:US17214448

    申请日:2021-03-26

    Abstract: Systems and methods are provided herein for generating a synthetic training data set that can be used to train a machine-learning model to identify when two addresses match (e.g., when a user-defined address and an authoritative address match). The addresses may each be tokenized. Each candidate address can be scored based on a number of common tokens it shares with the user-defined address. The highest-scored candidate address may be selected as a matching address for the user-defined address. In some embodiments, a number of the remaining candidate address can be selected as negative examples (e.g., candidate addresses that do not match the user-defined address) based on, for example, historical delivery information associated with the corresponding addresses. In this manner, an expansive training data set may be generated using addresses associated with user profiles of an online service provider and a set of authoritative addresses obtained from an authoritative source.

    Machine learned system for predicting item package quantity relationship between item descriptions

    公开(公告)号:US11461829B1

    公开(公告)日:2022-10-04

    申请号:US16455601

    申请日:2019-06-27

    Abstract: Systems and methods are disclosed to implement a machine learned system to determine the comparative relationship between item package quantity (IPQ) information indicated in two item descriptions. In embodiments, the system employs a neural network that includes a token encoding layer, an attribute summarizing layer, and a comparison layer. The token encoding layer accepts an item description as a token sequence and encodes the tokens with token attributes that are relevant to IPQ extraction. The attribute summarizing layer uses a convolutional neural network to generate a set of fixed-size feature vectors for each encoded token sequence. All feature vectors for both item descriptions are then provided to the comparison layer to generate the IPQ comparison result. Advantageously, the disclosed neural network model can be trained to make accurate predictions about the IPQ relationship of the two item descriptions using a small set of token-level attributes as input signals.

Patent Agency Ranking