-
1.
公开(公告)号:US12242928B1
公开(公告)日:2025-03-04
申请号:US16824480
申请日:2020-03-19
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Multiple distinct control descriptors, each specifying an algorithm and values of one or more parameters of the algorithm, are created. A plurality of tuples, each indicating a respective record of a data set and a respective descriptor, are generated. The tuples are distributed among a plurality of compute resources such that the number of distinct descriptors indicated in the tuples received at a given resource is below a threshold. The algorithm is executed in accordance with the descriptors' parameters at individual compute resources.
-
公开(公告)号:US11675766B1
公开(公告)日:2023-06-13
申请号:US16808162
申请日:2020-03-03
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
IPC: G06F16/22 , G06F16/28 , G06F16/901
CPC classification number: G06F16/2246 , G06F16/285 , G06F16/9024
Abstract: A hierarchical representation of an input data set comprising similarity scores for respective entity pairs is generated iteratively. In a particular iteration, clusters are obtained from a subset of the iteration's input entity pairs which satisfy a similarity criterion, and then spanning trees are generated for at least some of the clusters. An indication of at least a representative pair of one or more of the clusters is added to the hierarchical representation in the iteration. The hierarchical representation is used to respond to clustering requests.
-
3.
公开(公告)号:US11514321B1
公开(公告)日:2022-11-29
申请号:US16900620
申请日:2020-06-12
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Zahed Patel , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Entity record pairs are extracted from a selected cluster of entity records. Attribute value pairs are obtained from the entity record pairs. Labels are assigned to the attribute value pairs based at least in part on entity-level similarity scores of the entity records from which the attribute value pairs were obtained. A machine learning model is trained, using a data set which includes at least some attribute value pairs to which the labels are assigned, to generate attribute similarity scores for pairs of attribute values.
-
-