Multicast network and memory transfer optimizations for neural network hardware acceleration
摘要:
Neural network specific hardware acceleration optimizations are disclosed, including an optimized multicast network and an optimized DRAM transfer unit to perform in constant or linear time. The multicast network is a set of switch nodes organized into layers and configured to operate as a Beneš network. Configuration data may be accessed by all switch nodes in the network. Each layer is configured to perform a Beneš network transformation of the -previous layer within a computer instruction. Since the computer instructions are pipelined, the entire network of switch nodes may be configured in constant or linear time. Similarly a DRAM transfer unit configured to access memory in strides organizes memory into banks indexed by prime or relatively prime number amounts. The index value is selected as not to cause memory address collisions. Upon receiving a memory specification, the DRAM transfer unit may calculate out strides thereby accessing an entire tile of a tensor in constant or linear time.
信息查询
0/0