PROCESSING OF ASYMMETRICALLY QUANTIZED INPUT AND KERNEL COEFFICIENTS IN NEURAL NETWORK PROCESSOR

    公开(公告)号:US20240329929A1

    公开(公告)日:2024-10-03

    申请号:US18127528

    申请日:2023-03-28

    Applicant: Apple Inc.

    CPC classification number: G06F7/523 G06F7/50

    Abstract: Embodiments relate to performing multiply-accumulator operation on asymmetrically quantized input data and kernel data in a neural processor. Instead of adjusting to the input data at a multiply-accumulator to account for the asymmetric quantization of the input data, an adjusted bias for the multiply-accumulator operation is computed beforehand and stored in the multiply-accumulator. On the other hand, kernel coefficients derived from the kernel data are adjusted at the multiply-accumulator to account for the asymmetric quantization. In this way, computational complexity associated with asymmetric quantization may be reduced while increasing the efficiency of the convolution operations at the neural processor.

    NEURAL ENGINE WITH ACCELERATED MULTIPILER-ACCUMULATOR FOR CONVOLUTION OF INTERGERS

    公开(公告)号:US20240329933A1

    公开(公告)日:2024-10-03

    申请号:US18127650

    申请日:2023-03-28

    Applicant: Apple Inc.

    CPC classification number: G06F7/5443 G06N3/063

    Abstract: Embodiments of the present disclosure relate to a multiply-accumulator circuit that includes a main multiplier circuit operable in a floating-point mode or an integer mode and a supplemental multiplier circuit that operates in the integer mode. The main multiplier circuit generates a multiplied output that undergoes subsequent operations including a shifting operation in the floating-point mode whereas the supplemental multiplier generates another multiplied output that does not undergo any shifting operations. Hence, in the integer mode, two parallel multiply-add operations may be performed by the two multiplier circuits, and therefore accelerate the multiply-adder operations. Due to the lack of additional shifters associated with the supplemental multiplier circuit, the multiply-accumulator circuit does not have a significantly increased footprint.

    CHAINED NEURAL ENGINE WRITE-BACK ARCHITECTURE

    公开(公告)号:US20220036163A1

    公开(公告)日:2022-02-03

    申请号:US16942263

    申请日:2020-07-29

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a neural processor circuit that includes a first number of neural engine circuits, a second number of channels and a data processor circuit. The first number of neural engine circuits are pipelined into the second number of chains smaller than the first number. Each of the chains is configured to generate output data of a first size. Each of the channels is coupled to each of the chains and configured to transmit the output data from each of the neural engine circuits in the chains sequentially. The data processor circuit is coupled to the channels to receive the output data. The data processor circuit aggregates the output data of each of the chains into aggregated data of a second size larger than the first size and writes the aggregated data of the second size into a buffer memory of the data processor circuit.

    Circuit for performing pooling operation in neural processor

    公开(公告)号:US11144615B1

    公开(公告)日:2021-10-12

    申请号:US16848378

    申请日:2020-04-14

    Applicant: Apple Inc.

    Abstract: Embodiments relate to a denominator circuit that determines the number of valid elements of a data surface covered by a kernel depending on various locations of the kernel relative to the data surface. The denominator circuit includes a first circuit and a second circuit that have the same structure. The first circuit receives numbers representing different horizontal locations of a reference point in the kernel and generates a first matrix with first output elements corresponding to the different horizontal locations. The second circuit receives numbers representing different vertical locations of a reference point in the kernel and generates a second matrix with second output elements corresponding to the different vertical locations. A matrix multiplication of the first matrix and the second matrix is performed to obtain an array of valid elements covered by the kernel.

Patent Agency Ranking