Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Richard John Heaton"

1.

发明公开
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 审中-公开

公开(公告)号：US20230359876A1

公开(公告)日：2023-11-09

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

CPC classification number: G06N3/063 , G06N3/04

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

2.

发明授权
Top value computation on an integrated circuit device 有权

公开(公告)号：US11188302B1

公开(公告)日：2021-11-30

申请号：US16267031

申请日：2019-02-04

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Richard John Heaton

IPC: G06F7/24 , G06N3/063 , G06F9/50 , G06F9/48

Abstract: Top-k is a process by which the largest elements among a set of elements is found. In various implementations, a top-k computation can be executed by a neural network accelerator, where the top-k computation is performed using a process that makes use of the accelerators memory array. A set of numerical values on which to perform top-k can be stored in the memory array. The accelerator can locate the maximum value from among the set of numerical values, and can store the maximum value back into the memory array. The accelerator can next remove the maximum value from the set of numerical values, so that a next largest value can be found. To remove the maximum value, the accelerator can write a value representing negative infinity to the memory array at each location of the maximum value.

3.

发明申请
NEURAL NETWORK TRAINING IN A DISTRIBUTED SYSTEM 有权

公开(公告)号：US20210097396A1

公开(公告)日：2021-04-01

申请号：US16588603

申请日：2019-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Vignesh Vivekraja , Thiam Khean Hah , Randy Renfu Huang , Ron Diamant , Richard John Heaton

IPC: G06N3/08 , G06N3/10

Abstract: Methods and systems for performing a training operation of a neural network are provided. In one example, a method comprises: performing backward propagation computations for a second layer of a neural network to generate second weight gradients; splitting the second weight gradients into portions; causing a hardware interface to exchange a first portion of the second weight gradients with the second computer system; performing backward propagation computations for a first layer of the neural network to generate first weight gradients when the exchange of the first portion of the second weight gradients is underway, the first layer being a lower layer than the second layer in the neural network; causing the hardware interface to transmit the first weight gradients to the second computer system; and causing the hardware interface to transmit the remaining portions of the second weight gradients to the second computer system.

4.

发明授权
Neural network processing based on subgraph recognition 有权

公开(公告)号：US12093801B1

公开(公告)日：2024-09-17

申请号：US18142952

申请日：2023-05-03

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Randy Renfu Huang , Ron Diamant

IPC: G06F16/00 , G06F9/30 , G06F9/48 , G06F16/901 , G06N3/04

CPC classification number: G06N3/04 , G06F9/30003 , G06F9/4881 , G06F16/9024

Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.

5.

发明授权
Input batching with serial dynamic memory access 有权

公开(公告)号：US11875247B1

公开(公告)日：2024-01-16

申请号：US16905769

申请日：2020-06-18

Applicant: Amazon Technologies, Inc.

Inventor： Richard John Heaton , Ron Diamant

IPC: G06N3/063 , G06N3/08

CPC classification number: G06N3/063 , G06N3/08

Abstract: An acceleration engine with multiple accelerators may share a common set of data that is used by each accelerator to perform computations on input data. The set of shared data can be loaded into the acceleration engine from an external memory. Instead of accessing the external memory multiple times to load the set of shared data into each accelerator, the external memory can be accessed once using direct memory access to load the set of shared data into the first accelerator. The set of shared data can then be serially loaded from one accelerator to the next accelerator in the acceleration engine using direct memory access. To achieve data parallelism and reduce computation time, a runtime driver may split the input data into data batches, and each accelerator can perform computations on a different batch of input data with the common set of shared data.

6.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US11741350B2

公开(公告)日：2023-08-29

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

7.

发明授权
Non-intrusive hardware profiling 有权

公开(公告)号：US11119787B1

公开(公告)日：2021-09-14

申请号：US16368263

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Mohammad El-Shabani , Ron Diamant , Samuel Jacob , Ilya Minkin , Richard John Heaton

IPC: G06F9/44 , G06F8/41 , G06F11/30 , G06F9/38 , G06F11/22 , G06F9/455 , G06F11/36 , G06F9/445 , G06F11/34 , G06F9/30

Abstract: Systems and methods for non-intrusive hardware profiling are provided. In some cases integrated circuit devices can be manufactured without native support for performance measurement and/or debugging capabilities, thereby limiting visibility into the integrated circuit device. Understanding the timing of operations can help to determine whether the hardware of the device is operating correctly and, when the device is not operating correctly, provide information that can be used to debug the device. In order to measure execution time of various tasks performed by the integrated circuit device, program instructions may be inserted to generate notifications that provide tracing information, including timestamps, for operations executed by the integrated circuit device.

8.

发明申请
HIERARCHICAL PARTITIONING OF OPERATORS 有权

公开(公告)号：US20210158131A1

公开(公告)日：2021-05-27

申请号：US16698236

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang

IPC: G06N3/063 , G06N3/04

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

9.

发明授权
Accelerated convolution of neural networks 有权

公开(公告)号：US12205013B1

公开(公告)日：2025-01-21

申请号：US17009483

申请日：2020-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Thiam Khean Hah , Randy Renfu Huang , Richard John Heaton , Ron Diamant , Vignesh Vivekraja

IPC: G06N3/063 , G06N3/04

Abstract: Accelerated convolution of neural networks can be performed by executing N computing engines (CEs) of a neural network processor in parallel. An input dataset can be divided spatially into N chunks such that a respective last portion of each chunk overlaps with a respective first portion of a subsequent chunk. Portions of each chunk can be processed by a respective CE to generate a respective portion of an output dataset. The overlapping intermediate states computed by each CE from processing the overlapping portion can be stored locally for sharing with a subsequent CE using an on-chip bus.

10.

发明授权
Static memory allocation for neural network inference 有权

公开(公告)号：US12093806B1

公开(公告)日：2024-09-17

申请号：US16459501

申请日：2019-07-01

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Ron Diamant , Jeffrey T. Huynh , Drazen Borkovic , Randy Renfu Huang , Richard John Heaton

IPC: G06N3/063 , G06F8/41

CPC classification number: G06N3/063 , G06F8/41

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification