APPARATUS WITH ACCELERATED MACHINE LEARNING PROCESSING

    公开(公告)号:US20220036243A1

    公开(公告)日:2022-02-03

    申请号:US17147858

    申请日:2021-01-13

    Abstract: An apparatus includes a global memory and a systolic array. The global memory is configured to store and provide an input feature map (IFM) vector stream from an IFM tensor and a kernel vector stream from a kernel tensor. The systolic array is configured to receive the IFM vector stream and the kernel vector stream from the global memory. The systolic array is on-chip together with the global memory. The systolic array includes a plurality of processing elements (PEs) each having a plurality of vector units, each of the plurality of vector units being configured to perform a dot-product operation on at least one IFM vector of the IFM vector stream and at least one kernel vector of the kernel vector stream per unit clock cycle to generate a plurality of output feature maps (OFMs).

Patent Agency Ranking