-
公开(公告)号:US11410282B2
公开(公告)日:2022-08-09
申请号:US16836440
申请日:2020-03-31
Applicant: Arm Limited
Inventor: Jens Olson , Suraj Sudhir
Abstract: A computer-implemented method of providing a filter (F) in a neural processing unit comprises: receiving input corresponding to target dimensions (XT, YT) of the filter; receiving input corresponding to sub-filter dimensions (X1 . . . n′, Y1 . . . n′) of each of a plurality of sub-filters (SF1 . . . n) implementable in the neural processing unit; and defining the filter (F) as a combination of the plurality of sub-filters (SF1 . . . n), the combination having dimensions that equate to the target dimensions (XT, YT), and wherein the sub-filter dimensions (X1 . . . n′, Y1 . . . n′) of at least two of the sub-filters in the combination are unequal.
-
公开(公告)号:US11561795B2
公开(公告)日:2023-01-24
申请号:US16834833
申请日:2020-03-30
Applicant: Arm Limited
Inventor: Jens Olson , John Wakefield Brothers, III , Jared Corey Smolens , Chi-wen Cheng , Daren Croxford , Sharjeel Saeed , Dominic Hugo Symes
Abstract: Herein described is a method of operating an accumulation process in a data processing apparatus. The accumulation process comprises a plurality of accumulations which output a respective plurality of accumulated values, each based on a stored value and a computed value generated by a data processing operation. The method comprises storing a first accumulated value, the first accumulated value being one of said plurality of accumulated values, into a first storage device comprising a plurality of single-bit storage elements; determining that a predetermined trigger has been satisfied with respect to the accumulation process; and in response to the determining, storing at least a portion of a second accumulated value, the second accumulated value being one of said plurality of accumulated values, into a second storage device.
-
公开(公告)号:US12271608B2
公开(公告)日:2025-04-08
申请号:US18099627
申请日:2023-01-20
Applicant: Arm Limited
Abstract: A processor to generate accumulated data comprising, for an operation cycle: performing an operation on a first bit range of a set of first input data to generate a set of operation data, which is accumulated with stored data within a first storage device. A lowest n bits of the accumulated data are accumulated with first further stored data within a first bit range of a second storage device, and are bit-shifted from the first storage device. Further accumulated data is generated, comprising, for an operation cycle: performing the operation on a second bit range of the set of first input data to generate a further set of operation data, which is accumulated with the stored data within the first storage device. A lowest m bits of the further accumulated data is accumulated with second further stored data within a second bit range of the second storage device.
-
公开(公告)号:US12072808B2
公开(公告)日:2024-08-27
申请号:US18063478
申请日:2022-12-08
Applicant: Arm Limited
Inventor: Jens Olson , Jared Corey Smolens
IPC: G06F12/0875
CPC classification number: G06F12/0875 , G06F2212/1024 , G06F2212/221 , G06F2212/452
Abstract: A processor comprising a first storage managed as a circular buffer to store a plurality of data structures. Each data structure comprises: an identifier, a size indicator and first data associated with instructions for execution of a task. The processor is configured for searching for a data structure in the first storage. A data structure subsequent to the tail data structure can be located using a storage address in the first storage of a tail data structure and the size indicator of all data structures preceding the second data structure among the plurality of data structures. When a data structure is found, the task may be executed based at least in part on the first data of the found data structure.
-
公开(公告)号:US11948069B2
公开(公告)日:2024-04-02
申请号:US16518444
申请日:2019-07-22
Applicant: Arm Limited
Inventor: Lingchuan Meng , John Wakefield Brothers, III , Jens Olson , Jared Corey Smolens , Eric Kunze , Ian Rudolf Bratt
Abstract: A processor arranged to compress neural network activation data comprising an input module for obtaining neural network activation data. The processor also comprises a block creation module arranged to split the neural network activation data into a plurality of blocks; and a metadata generation module for generating metadata associated with at least one of the plurality of blocks. Based on the metadata generated a selection module selects a compression scheme for each of the plurality of blocks, and a compression module for applying the selected compression scheme to the corresponding block to produce compressed neural network activation data. An output module is also provided for outputting the compressed neural network activation data.
-
-
-
-