-
公开(公告)号:US12299581B1
公开(公告)日:2025-05-13
申请号:US17214448
申请日:2021-03-26
Applicant: Amazon Technologies, Inc.
Inventor: Patrick Ian Wilson , Dmitry Vladimir Zhiyanov , Lichao Wang , Jong Wan Kim , Srinivas K Yellala
Abstract: Systems and methods are provided herein for generating a synthetic training data set that can be used to train a machine-learning model to identify when two addresses match (e.g., when a user-defined address and an authoritative address match). The addresses may each be tokenized. Each candidate address can be scored based on a number of common tokens it shares with the user-defined address. The highest-scored candidate address may be selected as a matching address for the user-defined address. In some embodiments, a number of the remaining candidate address can be selected as negative examples (e.g., candidate addresses that do not match the user-defined address) based on, for example, historical delivery information associated with the corresponding addresses. In this manner, an expansive training data set may be generated using addresses associated with user profiles of an online service provider and a set of authoritative addresses obtained from an authoritative source.
-
2.
公开(公告)号:US11461829B1
公开(公告)日:2022-10-04
申请号:US16455601
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Lichao Wang , Kai Liu , Archi Dutta , Dmitry Zhiyanov
IPC: G06Q30/06 , G06N3/08 , G06F40/30 , G06N3/04 , G06F40/284
Abstract: Systems and methods are disclosed to implement a machine learned system to determine the comparative relationship between item package quantity (IPQ) information indicated in two item descriptions. In embodiments, the system employs a neural network that includes a token encoding layer, an attribute summarizing layer, and a comparison layer. The token encoding layer accepts an item description as a token sequence and encodes the tokens with token attributes that are relevant to IPQ extraction. The attribute summarizing layer uses a convolutional neural network to generate a set of fixed-size feature vectors for each encoded token sequence. All feature vectors for both item descriptions are then provided to the comparison layer to generate the IPQ comparison result. Advantageously, the disclosed neural network model can be trained to make accurate predictions about the IPQ relationship of the two item descriptions using a small set of token-level attributes as input signals.
-
公开(公告)号:US11423072B1
公开(公告)日:2022-08-23
申请号:US16945572
申请日:2020-07-31
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Lichao Wang , Archiman Dutta
Abstract: Respective text feature sets and non-text feature sets are generated corresponding to individual pairs of a plurality of record pairs. At least one text feature is based on whether a text token exists in both records of a pair. Perceptual hash values are used for non-text feature sets. A machine learning model is trained, using the text and non-text feature sets, to generate relationship scores for record pairs. The model includes a text sub-model and a non-text sub-model.
-
公开(公告)号:US11625555B1
公开(公告)日:2023-04-11
申请号:US16817218
申请日:2020-03-12
Applicant: Amazon Technologies, Inc.
Inventor: Dmitry Vladimir Zhiyanov , Lichao Wang , Archiman Dutta
Abstract: Respective labels are generated automatically for a plurality of record pairs, with a label for a given pair indicating a relationship detected between the records of the pair. One or more machine learning models are trained using the labeled record pairs. The trained versions of the models are stored.
-
-
-