-
公开(公告)号:US11675766B1
公开(公告)日:2023-06-13
申请号:US16808162
申请日:2020-03-03
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
IPC: G06F16/22 , G06F16/28 , G06F16/901
CPC classification number: G06F16/2246 , G06F16/285 , G06F16/9024
Abstract: A hierarchical representation of an input data set comprising similarity scores for respective entity pairs is generated iteratively. In a particular iteration, clusters are obtained from a subset of the iteration's input entity pairs which satisfy a similarity criterion, and then spanning trees are generated for at least some of the clusters. An indication of at least a representative pair of one or more of the clusters is added to the hierarchical representation in the iteration. The hierarchical representation is used to respond to clustering requests.
-
2.
公开(公告)号:US11514321B1
公开(公告)日:2022-11-29
申请号:US16900620
申请日:2020-06-12
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Zahed Patel , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Entity record pairs are extracted from a selected cluster of entity records. Attribute value pairs are obtained from the entity record pairs. Labels are assigned to the attribute value pairs based at least in part on entity-level similarity scores of the entity records from which the attribute value pairs were obtained. A machine learning model is trained, using a data set which includes at least some attribute value pairs to which the labels are assigned, to generate attribute similarity scores for pairs of attribute values.
-
公开(公告)号:US11929070B1
公开(公告)日:2024-03-12
申请号:US17461124
申请日:2021-08-30
Applicant: Amazon Technologies, Inc.
Inventor: Ruhi Sarikaya , Zheng Du , Xiaohu Liu , Kai Liu , Sriharsha Venkata Chintalapati , Chenlei Guo , Hung Tuan Pham , Joe Pemberton , Zhenyu Yao , Bigyan Rajbhandari
CPC classification number: G10L15/22 , G06N20/20 , G10L15/02 , G10L15/063 , G10L2015/225
Abstract: Techniques for performing centralized unsuperivised learning in a multi-domain system are described. A user may request labeled data for an ML task, where the request includes a prompt for obtaining relevant explicit user feedback. The system may use the prompt to collect explicit user feedback for relevant runtime user inputs. After a duration of time (in the user's request for labeled data) has elapsed, the system determines whether collected user feedback indicates processing of the user input was defective and, if so, determines a cause of the defective processing. The system then uses one or more label generators to generate labeled data using the collected user feedback, whether the processing was defective, and the determined defect cause.
-
4.
公开(公告)号:US11461829B1
公开(公告)日:2022-10-04
申请号:US16455601
申请日:2019-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Lichao Wang , Kai Liu , Archi Dutta , Dmitry Zhiyanov
IPC: G06Q30/06 , G06N3/08 , G06F40/30 , G06N3/04 , G06F40/284
Abstract: Systems and methods are disclosed to implement a machine learned system to determine the comparative relationship between item package quantity (IPQ) information indicated in two item descriptions. In embodiments, the system employs a neural network that includes a token encoding layer, an attribute summarizing layer, and a comparison layer. The token encoding layer accepts an item description as a token sequence and encodes the tokens with token attributes that are relevant to IPQ extraction. The attribute summarizing layer uses a convolutional neural network to generate a set of fixed-size feature vectors for each encoded token sequence. All feature vectors for both item descriptions are then provided to the comparison layer to generate the IPQ comparison result. Advantageously, the disclosed neural network model can be trained to make accurate predictions about the IPQ relationship of the two item descriptions using a small set of token-level attributes as input signals.
-
5.
公开(公告)号:US12242928B1
公开(公告)日:2025-03-04
申请号:US16824480
申请日:2020-03-19
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Multiple distinct control descriptors, each specifying an algorithm and values of one or more parameters of the algorithm, are created. A plurality of tuples, each indicating a respective record of a data set and a respective descriptor, are generated. The tuples are distributed among a plurality of compute resources such that the number of distinct descriptors indicated in the tuples received at a given resource is below a threshold. The algorithm is executed in accordance with the descriptors' parameters at individual compute resources.
-
-
-
-