-
公开(公告)号:US11423072B1
公开(公告)日:2022-08-23
申请号:US16945572
申请日:2020-07-31
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Lichao Wang , Archiman Dutta
Abstract: Respective text feature sets and non-text feature sets are generated corresponding to individual pairs of a plurality of record pairs. At least one text feature is based on whether a text token exists in both records of a pair. Perceptual hash values are used for non-text feature sets. A machine learning model is trained, using the text and non-text feature sets, to generate relationship scores for record pairs. The model includes a text sub-model and a non-text sub-model.
-
公开(公告)号:US12112252B1
公开(公告)日:2024-10-08
申请号:US17327422
申请日:2021-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Archiman Dutta
CPC classification number: G06N3/045 , G06F18/22 , G06V30/19013
Abstract: Devices, systems, and methods are provided for brand matching using multi-layer machine learning. A method may include generating, based on a first embedding vector and a second embedding vector as inputs to a twin neural network, a third embedding vector and a fourth embedding vector; generating, based on the first embedding vector and the second embedding vector as inputs to a difference neural network, a difference vector indicative of a difference between the first embedding vector and the second embedding vector; generating a concatenated vector by concatenating the third embedding vector with the fourth embedding vector and the difference vector; generating, based on the concatenated vector as an input to a feedforward neural network (FFN), a score between zero and one, the score indicative of a relationship between a first entity and a second entity.
-
公开(公告)号:US11675766B1
公开(公告)日:2023-06-13
申请号:US16808162
申请日:2020-03-03
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
IPC: G06F16/22 , G06F16/28 , G06F16/901
CPC classification number: G06F16/2246 , G06F16/285 , G06F16/9024
Abstract: A hierarchical representation of an input data set comprising similarity scores for respective entity pairs is generated iteratively. In a particular iteration, clusters are obtained from a subset of the iteration's input entity pairs which satisfy a similarity criterion, and then spanning trees are generated for at least some of the clusters. An indication of at least a representative pair of one or more of the clusters is added to the hierarchical representation in the iteration. The hierarchical representation is used to respond to clustering requests.
-
4.
公开(公告)号:US11514321B1
公开(公告)日:2022-11-29
申请号:US16900620
申请日:2020-06-12
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Zahed Patel , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Entity record pairs are extracted from a selected cluster of entity records. Attribute value pairs are obtained from the entity record pairs. Labels are assigned to the attribute value pairs based at least in part on entity-level similarity scores of the entity records from which the attribute value pairs were obtained. A machine learning model is trained, using a data set which includes at least some attribute value pairs to which the labels are assigned, to generate attribute similarity scores for pairs of attribute values.
-
5.
公开(公告)号:US12242928B1
公开(公告)日:2025-03-04
申请号:US16824480
申请日:2020-03-19
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Multiple distinct control descriptors, each specifying an algorithm and values of one or more parameters of the algorithm, are created. A plurality of tuples, each indicating a respective record of a data set and a respective descriptor, are generated. The tuples are distributed among a plurality of compute resources such that the number of distinct descriptors indicated in the tuples received at a given resource is below a threshold. The algorithm is executed in accordance with the descriptors' parameters at individual compute resources.
-
-
-
-