Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Jeffrey T. Huynh"

1.

发明授权
Static memory allocation for neural network inference 有权

公开(公告)号：US12093806B1

公开(公告)日：2024-09-17

申请号：US16459501

申请日：2019-07-01

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Ron Diamant , Jeffrey T. Huynh , Drazen Borkovic , Randy Renfu Huang , Richard John Heaton

IPC: G06N3/063 , G06F8/41

CPC classification number: G06N3/063 , G06F8/41

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

2.

发明授权
Memory operation for systolic array 有权

公开(公告)号：US12026607B1

公开(公告)日：2024-07-02

申请号：US17964291

申请日：2022-10-12

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06N3/063 , G06F15/80 , G06N3/02

CPC classification number: G06N3/063 , G06F15/8046 , G06N3/02

Abstract: A neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

3.

发明授权
Registers for restricted memory 有权

公开(公告)号：US11294599B1

公开(公告)日：2022-04-05

申请号：US16891438

申请日：2020-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Ron Diamant , Randy Renfu Huang , Sundeep Amirineni , Jeffrey T. Huynh

IPC: G06F15/76 , G06F3/06 , G06N3/02 , G06F13/28

Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

4.

发明授权
Efficient utilization of processing element array 有权

公开(公告)号：US12198041B2

公开(公告)日：2025-01-14

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04 , G06N3/045 , G06N3/08

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

5.

发明授权
Neural network operation reordering for parallel execution 有权

公开(公告)号：US11567778B2

公开(公告)日：2023-01-31

申请号：US17243415

申请日：2021-04-28

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant

IPC: G06F9/44 , G06F9/38 , G06F9/50 , G06N3/04 , G06N3/08

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

6.

发明授权
Dilated convolution using systolic array 有权

公开(公告)号：US11379555B2

公开(公告)日：2022-07-05

申请号：US16457503

申请日：2019-06-28

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06F17/15 , G06V10/75 , G06F15/80 , G06V30/413 , H04L49/9047

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

7.

发明申请
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 有权

公开(公告)号：US20210158132A1

公开(公告)日：2021-05-27

申请号：US16698461

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

Abstract: A computer-implemented method includes receiving a neural network model for implementation using a processing element array, where the neural network model includes a convolution operation on a set of input feature maps and a set of filters. The method also includes determining, based on the neural network model, that the convolution operation utilizes less than a threshold number of rows in the processing element array for applying a set of filter elements to the set of input feature maps, where the set of filter elements includes one filter element in each filter of the set of filters. The method further includes generating, for the convolution operation and based on the neural network model, a first instruction and a second instruction for execution by respective rows in the processing element array, where the first instruction and the second instruction use different filter elements of a filter in the set of filters.

8.

发明授权
Dilated convolution using systolic array 有权

公开(公告)号：US11816559B2

公开(公告)日：2023-11-14

申请号：US17832039

申请日：2022-06-03

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant

IPC: G06N3/063 , G06F15/80 , G06F17/15 , H04L49/9047 , G06V30/413

CPC classification number: G06N3/063 , G06F15/8046 , G06F17/153 , G06V30/413 , H04L49/9047

Abstract: In one example, a non-transitory computer readable medium stores instructions that, when executed by one or more hardware processors, cause the one or more hardware processors to: load a first weight data element of an array of weight data elements from a memory into a systolic array; select a subset of input data elements from the memory into the systolic array to perform first computations of a dilated convolution operation, the subset being selected based on a rate of the dilated convolution operation and coordinates of the weight data element within the array of weight data elements; and control the systolic array to perform the first computations based on the first weight data element and the subset to generate first output data elements of an output data array. An example of a compiler that generates the instructions is also provided.

9.

发明公开
EFFICIENT UTILIZATION OF PROCESSING ELEMENT ARRAY 审中-公开

公开(公告)号：US20230359876A1

公开(公告)日：2023-11-09

申请号：US18352768

申请日：2023-07-14

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

CPC classification number: G06N3/063 , G06N3/04

Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.

10.

发明授权
Neural network layer-by-layer debugging 有权

公开(公告)号：US11308396B2

公开(公告)日：2022-04-19

申请号：US16455329

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Jeffrey T. Huynh , Drazen Borkovic , Se jong Oh , Ron Diamant , Randy Renfu Huang

IPC: G06N3/08 , G06F9/38

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification