-
公开(公告)号:US11681902B2
公开(公告)日:2023-06-20
申请号:US16586764
申请日:2019-09-27
Applicant: Amazon Technologies, Inc.
Inventor: Jeffrey T Huynh , Vignesh Vivekraja
CPC classification number: G06N3/063 , G06F7/50 , G06F7/523 , G06F7/5443 , G06F7/78 , G06F9/5027 , G06F17/153
Abstract: In one example, a neural network accelerator can execute a set of instructions to: load a first weight data element from a memory into a systolic array, the first weight data element having first coordinates; extract, from the instructions, information indicating a first subset of input data elements to be obtained from the memory, the first subset being based on a stride of a transposed convolution operation and second coordinates of first weight data element in a rotated array of weight data elements; based on the information, obtain the first subset of input data elements from the memory; load the first subset of input data elements into the systolic array; and control the systolic array to perform first computations based on the first weight data element and the first subset of input data elements to generate output data elements of an array of output data elements.
-
公开(公告)号:US11182314B1
公开(公告)日:2021-11-23
申请号:US16698761
申请日:2019-11-27
Applicant: Amazon Technologies, Inc.
Inventor: Drazen Borkovic , Ilya Minkin , Vignesh Vivekraja , Richard John Heaton , Randy Renfu Huang
Abstract: An integrated circuit device implementing a neural network accelerator may have a peripheral bus interface to interface with a host memory, and neural network models can be loaded from the host memory onto the state buffer of the neural network accelerator for execution by the array of processing elements. The neural network accelerator may also have a memory interface to interface with a local memory. The local memory may store neural network models from the host memory, and the models can be loaded from the local memory into the state buffer with reduced latency as compared to loading from the host memory. In systems with multiple accelerators, the models in the local memory can also be shared amongst different accelerators.
-