-
公开(公告)号:US12112252B1
公开(公告)日:2024-10-08
申请号:US17327422
申请日:2021-05-21
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Archiman Dutta
CPC classification number: G06N3/045 , G06F18/22 , G06V30/19013
Abstract: Devices, systems, and methods are provided for brand matching using multi-layer machine learning. A method may include generating, based on a first embedding vector and a second embedding vector as inputs to a twin neural network, a third embedding vector and a fourth embedding vector; generating, based on the first embedding vector and the second embedding vector as inputs to a difference neural network, a difference vector indicative of a difference between the first embedding vector and the second embedding vector; generating a concatenated vector by concatenating the third embedding vector with the fourth embedding vector and the difference vector; generating, based on the concatenated vector as an input to a feedforward neural network (FFN), a score between zero and one, the score indicative of a relationship between a first entity and a second entity.
-
公开(公告)号:US11423072B1
公开(公告)日:2022-08-23
申请号:US16945572
申请日:2020-07-31
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Lichao Wang , Archiman Dutta
Abstract: Respective text feature sets and non-text feature sets are generated corresponding to individual pairs of a plurality of record pairs. At least one text feature is based on whether a text token exists in both records of a pair. Perceptual hash values are used for non-text feature sets. A machine learning model is trained, using the text and non-text feature sets, to generate relationship scores for record pairs. The model includes a text sub-model and a non-text sub-model.
-
3.
公开(公告)号:US10565385B1
公开(公告)日:2020-02-18
申请号:US15688772
申请日:2017-08-28
Applicant: Amazon Technologies, Inc.
Inventor: Lohith Ravi , Archiman Dutta
Abstract: Online service providers may operate a rendering service for generating and providing substitute web content information for rendering substitute web content instead of authentic web content. The rendering service may obtain web content information for the authentic web content in response to receiving a request for web content. The rendering service may use the web content information to generate the substitute web content information. The substitute web content information is useable by the computing device to generate substitute web content that includes one or more visual elements resembling resource objects of the authentic web content. The visual elements are rendered, as a result of processing by the computing device, as image content instead of interactive objects.
-
公开(公告)号:US08977622B1
公开(公告)日:2015-03-10
申请号:US13621550
申请日:2012-09-17
Applicant: Amazon Technologies, Inc.
Inventor: Archiman Dutta
CPC classification number: G06F17/30327 , G06F17/30598
Abstract: Disclosed are various embodiments for assessing the quality of a node that comprises a collection of items containing textual data. The homogeneity of the node can be related to its quality. Highly ranked descriptive terms used in the node are identified and quality score is calculated that provides a measure of the quality of the node. Additionally, a node can be examined for outliers to improve node quality.
Abstract translation: 公开了用于评估包括包含文本数据的项目的集合的节点的质量的各种实施例。 节点的同质性可以与其质量有关。 识别在节点中使用的高排名的描述性术语,并且计算质量得分,其提供节点质量的度量。 另外,可以检查节点的异常值以提高节点质量。
-
5.
公开(公告)号:US12242928B1
公开(公告)日:2025-03-04
申请号:US16824480
申请日:2020-03-19
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Multiple distinct control descriptors, each specifying an algorithm and values of one or more parameters of the algorithm, are created. A plurality of tuples, each indicating a respective record of a data set and a respective descriptor, are generated. The tuples are distributed among a plurality of compute resources such that the number of distinct descriptors indicated in the tuples received at a given resource is below a threshold. The algorithm is executed in accordance with the descriptors' parameters at individual compute resources.
-
公开(公告)号:US11625555B1
公开(公告)日:2023-04-11
申请号:US16817218
申请日:2020-03-12
Applicant: Amazon Technologies, Inc.
Inventor: Dmitry Vladimir Zhiyanov , Lichao Wang , Archiman Dutta
Abstract: Respective labels are generated automatically for a plurality of record pairs, with a label for a given pair indicating a relationship detected between the records of the pair. One or more machine learning models are trained using the labeled record pairs. The trained versions of the models are stored.
-
公开(公告)号:US10909144B1
公开(公告)日:2021-02-02
申请号:US14640974
申请日:2015-03-06
Applicant: Amazon Technologies, Inc.
Inventor: Archiman Dutta , Shoubhik Bhattacharya , Deepak Kumar Nayak , Avik Sinha
IPC: G06F16/28 , G06F16/242
Abstract: Methods, systems, and computer-readable media for taxonomy generation with automated analysis and auditing are disclosed. A primary classification is determined for a hierarchical taxonomy of items in a marketplace. The primary classification is selected from a plurality of terms describing items in the marketplace, and the primary classification is selected based at least in part on automated analysis of the terms. A plurality of secondary classifications are determined for the hierarchical taxonomy. The secondary classifications are selected from the terms describing the items in the marketplace, and the secondary classifications are selected based at least in part on automated analysis of the terms. The hierarchical taxonomy is modified based at least in part on feedback from a plurality of users. The feedback comprises one or more terms entered by one or more of the users to filter a set of items.
-
公开(公告)号:US11675766B1
公开(公告)日:2023-06-13
申请号:US16808162
申请日:2020-03-03
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Kai Liu , Nikhil Anand Navali , Archiman Dutta
IPC: G06F16/22 , G06F16/28 , G06F16/901
CPC classification number: G06F16/2246 , G06F16/285 , G06F16/9024
Abstract: A hierarchical representation of an input data set comprising similarity scores for respective entity pairs is generated iteratively. In a particular iteration, clusters are obtained from a subset of the iteration's input entity pairs which satisfy a similarity criterion, and then spanning trees are generated for at least some of the clusters. An indication of at least a representative pair of one or more of the clusters is added to the hierarchical representation in the iteration. The hierarchical representation is used to respond to clustering requests.
-
9.
公开(公告)号:US11514321B1
公开(公告)日:2022-11-29
申请号:US16900620
申请日:2020-06-12
Applicant: Amazon Technologies, Inc.
Inventor: Xianshun Chen , Zahed Patel , Kai Liu , Nikhil Anand Navali , Archiman Dutta
Abstract: Entity record pairs are extracted from a selected cluster of entity records. Attribute value pairs are obtained from the entity record pairs. Labels are assigned to the attribute value pairs based at least in part on entity-level similarity scores of the entity records from which the attribute value pairs were obtained. A machine learning model is trained, using a data set which includes at least some attribute value pairs to which the labels are assigned, to generate attribute similarity scores for pairs of attribute values.
-
公开(公告)号:US10783167B1
公开(公告)日:2020-09-22
申请号:US15244320
申请日:2016-08-23
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Archiman Dutta , Meghana Shivanand Rajamane
IPC: G06F40/117 , G06F16/28 , H04L29/08 , G06F16/2457
Abstract: Described are techniques for modifying or creating classification data used to automatically classify items in an online marketplace or catalog, based on user interaction data. For one or more classification labels that may be applied to an item, user interaction data indicative of a count of instances that the label was accessed, a length of time during which the label was accessed, counts of instances that parent and child labels were accessed, and counts of instances that the label was accessed via a search query may be determined. Based on the user interaction data, an importance score for the label may be determined. Labels having an importance score greater than or equal to a threshold value may be included in classification data and used for subsequent classification of items. Labels having an importance score less than a threshold may be excluded from the classification data.
-
-
-
-
-
-
-
-
-