Data distribution in an array of neural network cores
摘要:
Parallel processing among arrays of physical neural cores is provided. An array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer. A network is operatively connected to each of the neural cores. The output activation tensor is distributed across the neural cores. An input activation tensor is distributed across the neural cores. A weight tensor is distributed across the neural cores. Each neural core's computation comprises multiplying elements of a portion of the input activation tensor at that core with elements of a portion of the weight tensor at that core, and storing the summed products in a partial sum corresponding to an element of the output activation tensor. Each element of the output activation tensor is computed by accumulating all of the partial sums corresponding to that element via the network. The partial sums for each element of the output activation tensor are computed in a sequence of steps whose order is described by tracing a path through the weight tensor that visits every weight tensor element that contributes to any partial sum.
信息查询
0/0