Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Andrew Borthwick"

1.

发明申请
USING SPECIFIED PERFORMANCE ATTRIBUTES TO CONFIGURE MACHINE LEARNING PIPEPLINE STAGES FOR AN ETL JOB 有权

公开(公告)号：US20220261413A1

公开(公告)日：2022-08-18

申请号：US17687492

申请日：2022-03-04

Applicant: Amazon Technologies, Inc.

Inventor： Timothy Jones , Andrew Borthwick , Sergei Dobroshinsky , Shehzad Qureshi , Stephen Michael Ash , Pedrito Uriah Maynard-Zhang , Chethan Kommaranahalli Rudramuni , Abhishek Sharma , Juliana Saussy , Adam Lawrence Joseph Heinermann , Alaykumar Navinchandra Desai , Mehul A. Shah , Mehul Y. Shah , Anurag Windlass Gupta , Prajakta Datta Damle

IPC: G06F16/25 , G06F9/54 , G06N20/00

Abstract: Specified performance attributes may be used to configure machine learning transformations for ETL jobs. Performance attributes for a machine learning pipeline that applies a model to as part of a transformation for an ETL job may be used to configure a parameter in a stage of the machine learning pipeline. The configured stage may then be used when training the model. The trained machine learning pipeline may then be applied as part of a transformation operation included in an ETL job performed by the ETL system.

2.

发明授权
Supervised graph partitioning for record matching 有权

公开(公告)号：US11514054B1

公开(公告)日：2022-11-29

申请号：US16145104

申请日：2018-09-27

Applicant: Amazon Technologies, Inc.

Inventor： Andrew Borthwick , Robert Anthony Barton, Jr. , Stephen Michael Ash , Russell Reas

IPC: G06F16/2455 , G06N20/00 , G06F16/28 , G06F16/22 , G06F16/901

Abstract: Supervised partitioning is used to perform record matching. A request to identify matches between records is received. A graph representation that indicates similarities between the records is partitioned and an evaluation of the partitioning is performed according to a supervised machine learning technique to generate a confidence value in the partitioning. An indication of equivalent records according to the partitioning and the confidence value of the partitioning may be provided.

3.

发明授权
Unsupervised representation learning for structured records 有权

公开(公告)号：US11216701B1

公开(公告)日：2022-01-04

申请号：US15955617

申请日：2018-04-17

Applicant: Amazon Technologies, Inc.

Inventor： Yen Ling Adelene Sim , Andrew Borthwick

IPC: G06K9/62 , G06K9/72 , G06N20/00 , G06F16/28 , G06N3/08

Abstract: Techniques for generating record embeddings from structured records are described. A record embeddings generating engine processes structured records to build a token vocabulary. Token embeddings are created for each token in the vocabulary. The token embeddings are trained using a loss function that relates the token embeddings to the record-attribute-data structure of the structured records. A record embedding is assembled from the trained token embeddings.

4.

发明授权
Using specified performance attributes to configure machine learning pipepline stages for an ETL job 有权

公开(公告)号：US11941016B2

公开(公告)日：2024-03-26

申请号：US17687492

申请日：2022-03-04

Applicant: Amazon Technologies, Inc.

Inventor： Timothy Jones , Andrew Borthwick , Sergei Dobroshinsky , Shehzad Qureshi , Stephen Michael Ash , Pedrito Uriah Maynard-Zhang , Chethan Kommaranahalli Rudramuni , Abhishek Sharma , Juliana Saussy , Adam Lawrence Joseph Heinermann , Alaykumar Navinchandra Desai , Mehul A. Shah , Mehul Y. Shah , Anurag Windlass Gupta , Prajakta Datta Damle

IPC: G06F16/25 , G06F9/54 , G06N20/00

CPC classification number: G06F16/254 , G06F9/543 , G06N20/00

Abstract: Specified performance attributes may be used to configure machine learning transformations for ETL jobs. Performance attributes for a machine learning pipeline that applies a model to as part of a transformation for an ETL job may be used to configure a parameter in a stage of the machine learning pipeline. The configured stage may then be used when training the model. The trained machine learning pipeline may then be applied as part of a transformation operation included in an ETL job performed by the ETL system.

5.

发明授权
Memory-efficient streaming count estimation for multisets 有权

公开(公告)号：US11314730B1

公开(公告)日：2022-04-26

申请号：US16828188

申请日：2020-03-24

Applicant: Amazon Technologies, Inc.

Inventor： Andrew Borthwick , Stephen Michael Ash

IPC: G06F16/23 , G06F16/22

Abstract: Techniques for memory-efficient streaming count estimation for multisets are described. A method for memory-efficient streaming count estimation for multisets may include obtaining data from a plurality of data sources, and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table.

6.

发明授权
Using specified performance attributes to configure machine learning pipeline stages for an ETL job 有权

公开(公告)号：US11269911B1

公开(公告)日：2022-03-08

申请号：US16199115

申请日：2018-11-23

Applicant: Amazon Technologies, Inc.

Inventor： Timothy Jones , Andrew Borthwick , Sergei Dobroshinsky , Shehzad Qureshi , Stephen Michael Ash , Pedrito Uriah Maynard-Zhang , Chethan Kommaranahalli Rudramuni , Abhishek Sharma , Juliana Saussy , Adam Lawrence Joseph Heinermann , Alaykumar Navinchandra Desai , Mehul A. Shah , Mehul Y. Shah , Anurag Windlass Gupta , Prajakta Datta Damle

IPC: G06F16/25 , G06F9/54 , G06N20/00

Abstract: Specified performance attributes may be used to configure machine learning transformations for ETL jobs. Performance attributes for a machine learning pipeline that applies a model to as part of a transformation for an ETL job may be used to configure a parameter in a stage of the machine learning pipeline. The configured stage may then be used when training the model. The trained machine learning pipeline may then be applied as part of a transformation operation included in an ETL job performed by the ETL system.

7.

发明授权
Scaling record linkage via elimination of highly overlapped blocks 有权

公开(公告)号：US11113254B1

公开(公告)日：2021-09-07

申请号：US16587902

申请日：2019-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Andrew Borthwick , Stephen Michael Ash

IPC: G06F16/215 , G06F16/23

Abstract: Techniques for scaling record linkage via elimination of highly overlapped blocks are described. A method for scaling record linkage via elimination of highly overlapped blocks includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, identifying a plurality of sets of matching blocks from the first plurality of blocks, deleting the plurality of sets of matching blocks except for a first block from each set from the plurality of sets of matching blocks, and iteratively performing dynamic blocking based at least on the first block to generate subsequent pluralities of blocks until the subsequent pluralities of blocks are below a threshold size.

8.

发明授权
Scalable parallel elimination of approximately subsumed sets 有权

公开(公告)号：US11086940B1

公开(公告)日：2021-08-10

申请号：US16588296

申请日：2019-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Andrew Borthwick , Stephen Michael Ash

IPC: G06F16/9035 , G06F16/901 , G06K9/62 , G06F16/906

Abstract: Techniques for Scalable parallel elimination of approximately subsumed sets are described. A method for Scalable parallel elimination of approximately subsumed sets includes identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network, determining a plurality of subsumption relationships between blocks from the first plurality of blocks, retaining a first subset of the first plurality of blocks and demoting a second subset of the first plurality of blocks based at least on the plurality of subsumption relationships, and iteratively performing dynamic blocking based at least on the first subset of the plurality of matching blocks and the second subset of the plurality of matching blocks to generate a subsequent pluralities of blocks.

9.

发明授权
Intersection-based dynamic blocking 有权

公开(公告)号：US10599614B1

公开(公告)日：2020-03-24

申请号：US15860393

申请日：2018-01-02

Applicant: Amazon Technologies, Inc.

Inventor： Andrew Borthwick , Tianyi Lu , Shehzad Qureshi , Timothy Jones

IPC: G06F17/30 , G06F16/13 , G06N20/00 , G06F16/174

Abstract: Block size reduction iterations are performed on a plurality of blocks of records until a block size criterion is met. An iteration comprises identifying, from a first collection of blocks, using one or more pivot operations, a set of combinations of oversized blocks such that at least one record belongs to all blocks of a combination. A new block comprising records that are members of each block of a first combination of the set is included in a second collection of blocks to be examined in a subsequent iteration. On at least one block created in an iteration, analysis operations are performed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification