Systolic convolutional neural network

    公开(公告)号:US11188814B2

    公开(公告)日:2021-11-30

    申请号:US15945952

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

    Technique for managing coherency when an agent is to enter a state in which its cache storage is unused

    公开(公告)号:US11249908B1

    公开(公告)日:2022-02-15

    申请号:US17023771

    申请日:2020-09-17

    Applicant: Arm Limited

    Abstract: An apparatus and method are disclosed for managing cache coherency. The apparatus has a plurality of agents with cache storage for caching data, and coherency control circuitry for acting as a point of coherency for the data by implementing a cache coherency protocol. In accordance with the cache coherency protocol the coherency control circuitry responds to certain coherency events by issuing coherency messages to one or more of the agents. A given agent is arranged, prior to entering a given state in which its cache storage is unused, to perform a flush operation in respect of its cache storage that may cause one or more evict messages to be issued to the coherency control circuitry. Further, once all evict messages resulting from performance of the flush operation has been issued, the given agent issues an evict barrier message to the coherency control circuitry. The apparatus ensures that the evict barrier message is only processed by the coherency control circuitry once all evict messages resulting from performance of the flush operation have been processed by the coherency control circuitry. When processing the evict barrier message, the coherency control circuitry issues a barrier response message to the given agent once it is determined that there are no outstanding coherency messages, and the given agent defers entering the given state until at least the barrier response message is received.

    Apparatus and method for issuing access requests to a memory controller

    公开(公告)号:US10956045B2

    公开(公告)日:2021-03-23

    申请号:US14969414

    申请日:2015-12-15

    Applicant: ARM Limited

    Abstract: An apparatus and method are provided for issuing access requests to a memory controller for a memory device whose memory structure consists of a plurality of sub-structures. The apparatus has a request interface for issuing access requests to the memory controller, each access request identifying a memory address. Within the apparatus static abstraction data is stored providing an indication of one or more of the sub-structures of the memory device, and the apparatus also stores an indication of outstanding access requests issued from the request interface. Next access request selection circuitry is then arranged to select from a plurality of candidate access requests a next access request to issue from the request interface. That selection is dependent on sub-structure indication data that is derived from application of an abstraction data function, using the static abstraction data, to the memory addresses of the candidate access requests and the outstanding access requests. Such an approach enables the apparatus to provide a series of access requests to the memory controller with the aim of enabling the memory controller to perform a more optimal access sequence with regard to the memory device.

    Graphics processing systems
    7.
    发明授权

    公开(公告)号:US11127187B2

    公开(公告)日:2021-09-21

    申请号:US16697984

    申请日:2019-11-27

    Applicant: Arm Limited

    Abstract: When processing graphics primitives in a graphics processing system, the render output is divided into a plurality of regions (40) for rendering, each region (40) comprising a respective area of the render output; and for sets of one or more primitives to be rendered, it is determined for which of the plurality of regions of the render output (40) the primitive(s) should be rendered; and for each region of the render output (40) it is determined the primitive(s) should be rendered for, geometry data for the primitive(s) is stored in memory in a respective data structure (42) along with an indication of state data that is to be used for rendering the primitive(s) for the region, such that the geometry data for the primitive(s) to be rendered is stored in a respective, different data structure (42) for each different region of the render output (40) it is determined the primitive(s) should be rendered for.

    SYSTOLIC CONVOLUTIONAL NEURAL NETWORK
    8.
    发明申请

    公开(公告)号:US20190311243A1

    公开(公告)日:2019-10-10

    申请号:US15945952

    申请日:2018-04-05

    Applicant: Arm Limited

    Abstract: A circuit and method are provided for performing convolutional neural network computations for a neural network. The circuit includes a transposing buffer configured to receive actuation feature vectors along a first dimension and to output feature component vectors along a second dimension, a weight buffer configured to store kernel weight vectors along a first dimension and further configured to output kernel component vectors along a second dimension, and a systolic array configured to receive the kernel weight vectors along a first dimension and to receive the feature component vectors along a second dimension. The systolic array includes an array of multiply and accumulate (MAC) processing cells. Each processing cell is associated with an output value. The actuation feature vectors may be shifted into the transposing buffer along the first dimension and output feature component vectors may shifted out of the transposing buffer along the second dimension, providing efficient dataflow.

    Compression of neural network activation data

    公开(公告)号:US11948069B2

    公开(公告)日:2024-04-02

    申请号:US16518444

    申请日:2019-07-22

    Applicant: Arm Limited

    CPC classification number: G06N3/063 H03M7/70

    Abstract: A processor arranged to compress neural network activation data comprising an input module for obtaining neural network activation data. The processor also comprises a block creation module arranged to split the neural network activation data into a plurality of blocks; and a metadata generation module for generating metadata associated with at least one of the plurality of blocks. Based on the metadata generated a selection module selects a compression scheme for each of the plurality of blocks, and a compression module for applying the selected compression scheme to the corresponding block to produce compressed neural network activation data. An output module is also provided for outputting the compressed neural network activation data.

Patent Agency Ranking