Matrix multiplication system, apparatus and method

    公开(公告)号:US11194549B2

    公开(公告)日:2021-12-07

    申请号:US16663887

    申请日:2019-10-25

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system, matrix multiply accelerator (MMA) and method for efficiently multiplying matrices. The MMA includes a vector register to store the row vectors of one input matrix, a vector register to store the column vectors of another input matrix, a vector register to store an output matrix, and an array of vector multiply and accumulate (VMAC) units coupled to the vector registers. Each VMAC unit is coupled to at least two row vector signal lines and at least two column vector signal lines, and is configured to calculate the dot product for one element i,j of the output matrix by multiplying each row vector formed from the ith row of the first matrix with a corresponding column vector formed from the jth column of the second matrix to generate intermediate products, and accumulate the intermediate products into a scalar value.

    System, method and apparatus for data manipulation

    公开(公告)号:US10970201B2

    公开(公告)日:2021-04-06

    申请号:US16168902

    申请日:2018-10-24

    Applicant: Arm Limited

    Abstract: A system, apparatus and method for utilizing a transpose function to generate a two-dimensional array from three-dimensional input data. The use of the transpose function reduces redundant elements in the resultant two-dimensional array thereby increasing efficiency and decreasing power consumption.

    Matrix Multiplication System and Method

    公开(公告)号:US20210097130A1

    公开(公告)日:2021-04-01

    申请号:US16585265

    申请日:2019-09-27

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.

    Hybrid Filter Banks for Artificial Neural Networks

    公开(公告)号:US20210089888A1

    公开(公告)日:2021-03-25

    申请号:US16836110

    申请日:2020-03-31

    Applicant: Arm Limited

    Abstract: The present disclosure advantageously provides a system including a memory, a processor, and a circuitry to execute one or more mixed precision layers of an artificial neural network (ANN), each mixed precision layer including high-precision weight filters and low precision weight filters. The circuitry is configured to perform one or more calculations on an input feature map having a plurality of input channels (cin) using the high precision weight filters to create a high precision output feature map having a first number of output channels (k), perform one or more calculations on the input feature map using the low precision weight filters to create a low precision output feature map having a second number of output channels (cout−k), and concatenate the high precision output feature map and the low precision output feature map to create a unified output feature map having a plurality of output channels (cout).

    Correlation determination early termination

    公开(公告)号:US09960904B2

    公开(公告)日:2018-05-01

    申请号:US15324841

    申请日:2015-06-03

    Applicant: ARM LIMITED

    Abstract: Correlation circuitry includes selection circuitry for selecting a sequence of symbol subsets comprising proper subsets of a candidate sequence of symbols and corresponding proper subsets of a target sequence of symbols. Correlation value determining circuitry determines partial correlation values for these proper subsets which are then combined by correlation value combining circuitry to generate a current combined correlation value. Early termination circuitry compares the current combined correlation value with an early termination condition represented by a threshold value to determine whether or not early termination of the correlation determination may be performed. Early termination may be performed when the combined correlation value indicates a sufficient degree of confidence in the partial result determined such that continuing with determination of the full correlation is not justified.

    Operating parameter circuitry and method
    16.
    发明授权
    Operating parameter circuitry and method 有权
    工作参数电路和方法

    公开(公告)号:US09548749B2

    公开(公告)日:2017-01-17

    申请号:US14531479

    申请日:2014-11-03

    Applicant: ARM Limited

    CPC classification number: H03L7/102 H03L7/0992 H03L2207/06

    Abstract: An operating parameter method and circuitry are provided that generate operating parameter signals that are compensated for noise. Such operating parameter circuitry includes control loop circuitry that operates from a first power supply to provide an operating parameter signal to functional circuitry operating from a second power supply separate from the first power supply. The control loop circuitry comprises generator circuitry to generate the operating parameter signal based on an input signal. Replica generator circuitry operates from the second power supply to generate a further operating parameter signal based on the input signal. Adjustment circuitry performs a comparison on the operating parameter signal and the further operating parameter signal and causes an adjusted input signal to be produced in dependence on a result of the comparison. The adjusted input signal is received by the generator circuitry. Consequently, the generator circuitry is able to produce an operating parameter signal that has been compensated for noise in the circuit.

    Abstract translation: 提供了一种操作参数方法和电路,其产生被补偿噪声的操作参数信号。 这种操作参数电路包括控制回路电路,其从第一电源操作,以向从与第一电源分开的第二电源操作的功能电路提供操作参数信号。 控制回路电路包括基于输入信号产生操作参数信号的发生器电路。 复制发生器电路从第二电源操作以基于输入信号生成另外的操作参数信号。 调整电路对操作参数信号和其他操作参数信号进行比较,并根据比较结果产生调整后的输入信号。 经调整的输入信号由发生器电路接收。 因此,发电机电路能够产生已经补偿了电路中的噪声的工作参数信号。

    Time domain unrolling sparse matrix multiplication system and method

    公开(公告)号:US11928176B2

    公开(公告)日:2024-03-12

    申请号:US17103676

    申请日:2020-11-24

    Applicant: Arm Limited

    CPC classification number: G06F17/16 G06F7/5443 G06F15/80 G06F9/3893

    Abstract: A system and method for multiplying matrices are provided. The system includes a processor coupled to a memory and a matrix multiply accelerator (MMA) coupled to the processor. The MMA is configured to multiply, based on a bitmap, a compressed first matrix and a second matrix to generate an output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the compressed first matrix and a jth column of the second matrix based on the bitmap. Or, the MMA is configured to multiply, based on the bitmap, the second matrix and the compressed first matrix and to generate the output matrix including, for each element i,j of the output matrix, a calculation of a dot product of an ith row of the second matrix and a jth column of the compressed first matrix based on the bitmap.

    Video data processing
    18.
    发明授权

    公开(公告)号:US11823430B2

    公开(公告)日:2023-11-21

    申请号:US17378014

    申请日:2021-07-16

    Applicant: Arm Limited

    Abstract: A method for processing video data, comprising: receiving raw video data, representative of a plurality of frames; detecting, using the raw video data, one or more regions of interest in a detection frame that belongs to the plurality of frames, for example using a region proposal network; performing a cropping process on a portion of the raw video data representative of the detection frame, based on the regions of interest, so as to generate cropped raw video data; performing image processing on the cropped raw video data, including demosaicing, so as to generate processed image data for the detection frame; and analyzing the processed image data, for example using an object detection process, to determine information relating to at least one of said one or more regions of interest.

    Neural Network System and Training Method
    19.
    发明公开

    公开(公告)号:US20230229921A1

    公开(公告)日:2023-07-20

    申请号:US17576101

    申请日:2022-01-14

    Applicant: Arm Limited

    CPC classification number: G06N3/082 G06N3/10

    Abstract: Neural network systems and methods are provided. One method for processing a neural network includes, for at least one neural network layer that includes a plurality of weights, applying an offset function to each of a plurality of weight values in the plurality of weights to generate an offset weight value, and quantizing the offset weight values to form quantized offset weight values. The plurality of weights are pruned. One method for executing a neural network includes reading, from a memory, at least one neural network layer that includes quantized offset weight values and an offset value α, and performing a neural network layer operation on an input feature map, based on the quantized offset weight values and the offset value α, to generate an output feature map. The quantized offset weight values are signed integer numbers.

    Multi-Dimensional Data Path Architecture

    公开(公告)号:US20220382690A1

    公开(公告)日:2022-12-01

    申请号:US17334960

    申请日:2021-05-31

    Applicant: Arm Limited

    Abstract: Various implementations described herein are directed to a device having a multi-layered logic structure with a first logic layer and a second logic layer arranged vertically in a stacked configuration. The device may have a memory array that provides data, and also, the device may have an inter-layer data bus that vertically couples the memory array to the multi-layered logic structure. The inter-layer data bus may provide multiple data paths to the first logic layer and the second logic layer for reuse of the data provided by the memory array.

Patent Agency Ranking