-
公开(公告)号:US10956584B1
公开(公告)日:2021-03-23
申请号:US16141770
申请日:2018-09-25
Applicant: Amazon Technologies, inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Ron Diamant , David James Borland
Abstract: Systems and methods for performing neural network processing are provided. In one example, a system comprises a neural network processor comprising: a data decryption engine that receives encrypted data and decrypts the encrypted data, the encrypted data comprising at least one of: encrypted weights data, encrypted input data, or encrypted instruction data related to a neural network model; and a computing engine that receives the weights data and perform computations of neural network processing using the input data and the weights data and based on the instruction data.
-
公开(公告)号:US10929063B1
公开(公告)日:2021-02-23
申请号:US16368538
申请日:2019-03-28
Applicant: Amazon Technologies, Inc.
Inventor: Vignesh Vivekraja , Yu Zhou , Ron Diamant , Randy Renfu Huang , Richard John Heaton
Abstract: Systems and methods for assisted indirect memory addressing are provided. Some computing systems move data between levels of a hierarchical memory system. To accommodate data movement for computing systems that do not natively support indirect addressing between levels of the memory hierarchy, a direct memory access (DMA) engine is used to fetch data. The DMA engine executes a first set of memory instructions that modify a second set of memory instructions to fetch data stored at one level of the memory hierarchy from dynamically computed indirect addresses stored in memory locations at another level of the memory hierarchy.
-
公开(公告)号:US12198041B2
公开(公告)日:2025-01-14
申请号:US18352768
申请日:2023-07-14
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T. Huynh , Ron Diamant , Hongbin Zheng , Yizhi Liu , Animesh Jain , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang , Sundeep Amirineni , Drazen Borkovic
Abstract: Generating instructions for programming a processing element array to implement a convolution operation can include determining that the convolution operation under-utilizes the processing element array. The convolution operation involves using the processing element array to perform a series of matrix multiplications between a set of filters and a set of input matrices. Each filter comprises a weight matrix. Each input matrix is assigned to a respective row in the processing element array. Under-utilization can be determined through detecting that less than a threshold number of rows would be used concurrently. In response to determining that the convolution operation under-utilizes the processing element array, instructions can be added for modifying the convolution operation to increase the number of rows used concurrently. The added instructions are executable to cause at least one input matrix to be processed in parallel across more rows compared to processing without modifying the convolution operation.
-
公开(公告)号:US12079734B1
公开(公告)日:2024-09-03
申请号:US17878824
申请日:2022-08-01
Applicant: Amazon Technologies, Inc.
Inventor: Hongbin Zheng , Randy Renfu Huang , Richard John Heaton
Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.
-
公开(公告)号:US12008469B1
公开(公告)日:2024-06-11
申请号:US17009517
申请日:2020-09-01
Applicant: Amazon Technologies, Inc.
Inventor: Thiam Khean Hah , Randy Renfu Huang , Richard John Heaton , Ron Diamant , Vignesh Vivekraja
Abstract: A single neural network model can be used by each computing engine (CE) in a neural network processor to perform convolution operations in parallel for one or more stacks of convolutional layers. An input feature map can be divided into N chunks to be processed by N CEs, respectively. Each CE can process a last portion of a respective chunk to generate respective shared states to be used by a subsequent CE. A first CE uses pre-computed states to generate a first portion of an output feature map, while other CEs use shared states computed by a preceding CE to generate respective portions of the output feature map.
-
公开(公告)号:US11687761B2
公开(公告)日:2023-06-27
申请号:US16216485
申请日:2018-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Richard John Heaton , Andrea Olgiati , Ron Diamant
IPC: G06N3/045 , G06N3/04 , G06N3/08 , G06F18/214
CPC classification number: G06N3/045 , G06F18/214 , G06N3/04 , G06N3/08
Abstract: Systems and methods for performing improper input data detection are described. In one example, a system comprises: hardware circuits configured to receive input data and to perform computations of a neural network based on the input data to generate computation outputs; and an improper input detection circuit configured to: determine a relationship between the computation outputs of the hardware circuits and reference outputs; determine that the input data are improper based on the relationship; and perform an action based on determining that the input data are improper.
-
公开(公告)号:US20230153620A1
公开(公告)日:2023-05-18
申请号:US18154576
申请日:2023-01-13
Applicant: Amazon Technologies, Inc.
Inventor: Randy Renfu Huang , Ron Diamant , Richard John Heaton
Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.
-
公开(公告)号:US11561833B1
公开(公告)日:2023-01-24
申请号:US16021866
申请日:2018-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Randy Renfu Huang , Drazen Borkovic , Jindrich Zejda
Abstract: Techniques for operating a computing system to perform neural network operations are disclosed. In one example, a method comprises receiving a neural network model, determining a sequence of neural network operations based on data dependency in the neural network model, and determining a set of instructions to map the sequence of neural network operations to the processing resources of the neural network processor. The method further comprises determining, based on a set of memory access operations included in the set of instructions, a first set of memory references associated with a first location of an external memory to store the input data and a second set of memory references associated with a second location of the external memory to store the output data, and generating an instruction file including the set of instructions, the first set of memory references and the second set of memory references.
-
公开(公告)号:US11531578B1
公开(公告)日:2022-12-20
申请号:US16216887
申请日:2018-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Richard John Heaton , Ilya Minkin
Abstract: Remote access for debugging or profiling a remotely executing neural network graph can be performed by a client using an in-band application programming interface (API). The client can provide indicator flags for debugging or profiling in an inference request sent to a remote server computer executing the neural network graph using the API. The remote server computer can collect metadata for debugging or profiling during the inference operation using the neural network graph and send it back to the client using the same API. Additionally, the metadata can be collected at various granularity levels also specified in the inference request.
-
公开(公告)号:US11461662B1
公开(公告)日:2022-10-04
申请号:US16829887
申请日:2020-03-25
Applicant: Amazon Technologies, Inc.
Inventor: Hongbin Zheng , Randy Renfu Huang , Richard John Heaton
Abstract: Techniques for reducing a compilation time for compiling a neural network are disclosed. A description of a neural network is received by a compiler. A plurality of operators are identified based on the description of the neural network. A plurality of subgraphs are formed, each including one or more operators. For each subgraph, a performance factor is calculated based on a compute usage and a memory usage associated with the operators included in the subgraph. The performance factor is compared to a threshold. Based on the comparison, either the subgraph is classified as a compute bound subgraph and a set of memory optimizations are suppressed or the subgraph is classified as a memory bound subgraph and a set of compute optimizations are suppressed.
-
-
-
-
-
-
-
-
-