MULTIPLY-ACCUMULATE SHARING CONVOLUTION CHAINING FOR EFFICIENT DEEP LEARNING INFERENCE
Abstract:
Systems, apparatuses and methods may provide for technology that chains a plurality of convolution operations together, wherein the plurality of convolution operations include one or more one-dimensional (1D) convolution operations and one or more two-dimensional (2D) convolution operations, streams the plurality of convolution operations to shared multiply-accumulate (MAC) hardware, wherein to stream the plurality of convolution operations to the shared MAC hardware, the technology swaps weight inputs to the shared MAC hardware with activation inputs to the shared MAC hardware based on convolution type, and stores output data associated with the plurality of convolution operations to a local memory. Each of the 2D convolution operations may include a multi-cycle multiplication operation.
Information query
Patent Agency Ranking
0/0