Efficient hardware architecture for accelerating grouped convolutions

    公开(公告)号:US11544191B2

    公开(公告)日:2023-01-03

    申请号:US16830457

    申请日:2020-03-26

    Abstract: Hardware accelerators for accelerated grouped convolution operations. A first buffer of a hardware accelerator may receive a first row of an input feature map (IFM) from a memory. A first group comprising a plurality of tiles may receive a first row of the IFM. A plurality of processing elements of the first group may compute a portion of a first row of an output feature map (OFM) based on the first row of the IFM and a kernel. A second buffer of the accelerator may receive a third row of the IFM from the memory. A second group comprising a plurality of tiles may receive the third row of the IFM. A plurality of processing elements of the second group may compute a portion of a third row of the OFM based on the third row of the IFM and the kernel as part of a grouped convolution operation.

    Machine learning accelerator architecture

    公开(公告)号:US10769526B2

    公开(公告)日:2020-09-08

    申请号:US15960851

    申请日:2018-04-24

    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises accelerator circuitry including a first set of processing elements to perform first computations including matrix multiplication operations, a second set of processing elements to perform second computations including sum of elements of weights and offset multiply operations and a third set of processing elements to perform third computations including sum of elements of inputs and offset multiply operations, wherein the second and third computations are performed in parallel with the first computations.

Patent Agency Ranking