Efficient hardware architecture for accelerating grouped convolutions

    公开(公告)号:US11544191B2

    公开(公告)日:2023-01-03

    申请号:US16830457

    申请日:2020-03-26

    Abstract: Hardware accelerators for accelerated grouped convolution operations. A first buffer of a hardware accelerator may receive a first row of an input feature map (IFM) from a memory. A first group comprising a plurality of tiles may receive a first row of the IFM. A plurality of processing elements of the first group may compute a portion of a first row of an output feature map (OFM) based on the first row of the IFM and a kernel. A second buffer of the accelerator may receive a third row of the IFM from the memory. A second group comprising a plurality of tiles may receive the third row of the IFM. A plurality of processing elements of the second group may compute a portion of a third row of the OFM based on the third row of the IFM and the kernel as part of a grouped convolution operation.

    Method, Apparatus And System For Communicating Between Multiple Protocols

    公开(公告)号:US20170286357A1

    公开(公告)日:2017-10-05

    申请号:US15084555

    申请日:2016-03-30

    CPC classification number: G06F13/4291 G06F13/4286 H04L69/08

    Abstract: In one embodiment, an apparatus comprises: a controller to communicate data having a format according to a first communication protocol, the controller comprising a Mobile Industry Processor Interface (MIPI)-compatible controller; an interface circuit coupled to the controller to receive the data, convert the data and communicate the converted data to a physical unit of a second communication protocol, the converted data having a format according to the second communication protocol; and the physical unit coupled to the interface circuit to receive and serialize the converted data and output the serialized converted data to a destination. Other embodiments are described and claimed.

    Machine learning accelerator architecture

    公开(公告)号:US10769526B2

    公开(公告)日:2020-09-08

    申请号:US15960851

    申请日:2018-04-24

    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises accelerator circuitry including a first set of processing elements to perform first computations including matrix multiplication operations, a second set of processing elements to perform second computations including sum of elements of weights and offset multiply operations and a third set of processing elements to perform third computations including sum of elements of inputs and offset multiply operations, wherein the second and third computations are performed in parallel with the first computations.

    EFFICIENT MEMORY LAYOUT FOR ENABLING SMART DATA COMPRESSION IN MACHINE LEARNING ENVIRONMENTS

    公开(公告)号:US20190066257A1

    公开(公告)日:2019-02-28

    申请号:US15682795

    申请日:2017-08-22

    Abstract: A mechanism is described for facilitating efficient memory layout for enabling smart data compression in machine learning environments. A method of embodiments, as described herein, includes facilitating dividing an initial tile representing an image into primary multiple tiles such that each tile of the primary multiple tiles is regarded as an independent image as processed by one or more processors of a computing device. The method may further include computing the primary multiple tiles into secondary multiple tiles compatible in size of a local buffer. The method may further include merging the multiple secondary multiple tiles into a final tile representing the image, and compressing the final tile.

    Efficient memory layout for enabling smart data compression in machine learning environments

    公开(公告)号:US10600147B2

    公开(公告)日:2020-03-24

    申请号:US15682795

    申请日:2017-08-22

    Abstract: A mechanism is described for facilitating efficient memory layout for enabling smart data compression in machine learning environments. A method of embodiments, as described herein, includes facilitating dividing an initial tile representing an image into primary multiple tiles such that each tile of the primary multiple tiles is regarded as an independent image as processed by one or more processors of a computing device. The method may further include computing the primary multiple tiles into secondary multiple tiles compatible in size of a local buffer. The method may further include merging the multiple secondary multiple tiles into a final tile representing the image, and compressing the final tile.

    COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE

    公开(公告)号:US20190197420A1

    公开(公告)日:2019-06-27

    申请号:US15853457

    申请日:2017-12-22

    CPC classification number: G06N5/046 G06F13/28 G06F17/16 G06N20/00 G06T15/205

    Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.

    Compression for deep learning in case of sparse values mapped to non-zero value

    公开(公告)号:US12147914B2

    公开(公告)日:2024-11-19

    申请号:US18466981

    申请日:2023-09-14

    Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.

Patent Agency Ranking