-
公开(公告)号:US11625555B1
公开(公告)日:2023-04-11
申请号:US16817218
申请日:2020-03-12
Applicant: Amazon Technologies, Inc.
Inventor: Dmitry Vladimir Zhiyanov , Lichao Wang , Archiman Dutta
Abstract: Respective labels are generated automatically for a plurality of record pairs, with a label for a given pair indicating a relationship detected between the records of the pair. One or more machine learning models are trained using the labeled record pairs. The trained versions of the models are stored.
-
公开(公告)号:US12299581B1
公开(公告)日:2025-05-13
申请号:US17214448
申请日:2021-03-26
Applicant: Amazon Technologies, Inc.
Inventor: Patrick Ian Wilson , Dmitry Vladimir Zhiyanov , Lichao Wang , Jong Wan Kim , Srinivas K Yellala
Abstract: Systems and methods are provided herein for generating a synthetic training data set that can be used to train a machine-learning model to identify when two addresses match (e.g., when a user-defined address and an authoritative address match). The addresses may each be tokenized. Each candidate address can be scored based on a number of common tokens it shares with the user-defined address. The highest-scored candidate address may be selected as a matching address for the user-defined address. In some embodiments, a number of the remaining candidate address can be selected as negative examples (e.g., candidate addresses that do not match the user-defined address) based on, for example, historical delivery information associated with the corresponding addresses. In this manner, an expansive training data set may be generated using addresses associated with user profiles of an online service provider and a set of authoritative addresses obtained from an authoritative source.
-
公开(公告)号:US10565498B1
公开(公告)日:2020-02-18
申请号:US15445698
申请日:2017-02-28
Applicant: Amazon Technologies, Inc.
Inventor: Dmitry Vladimir Zhiyanov
Abstract: A data set whose records include respective pairs of entity descriptors with at least some text and a representation of a relationship such as similarity between the entities of the pair is obtained. Using the data set, a neural network model is trained to generate relationship indicators for pairs of entity descriptors. In an extensible token model of the neural network model, a text token of a first attribute of a particular entity descriptor is represented by a plurality of features including a first feature which was added to the token model as a result of a programmatic request. A particular relationship indicator corresponding to a source entity descriptor and a target entity descriptor is generated using the trained neural network model.
-
-