METHODS AND ELECTRONIC DEVICE FOR HIGH PERFORMANCE MODULO MULTIPLICATION

    公开(公告)号:US20240361984A1

    公开(公告)日:2024-10-31

    申请号:US18325399

    申请日:2023-05-30

    Inventor: Sandesh Kanchodu

    CPC classification number: G06F7/5318 G06F7/50 G06F7/5312 G06F7/5443

    Abstract: Embodiments herein disclose high performance modulo multiplication methods performed by circuitry of an electronic device. The method includes obtaining and summing partial products to obtain a partial multiplication result using a primary Wallace tree. The partial multiplication result is fed back in a next cycle for subsequent limb multiplication associated with the primary Wallace tree. The obtaining and summing of partial products and feeding back operations are repeated until all limbs associated with the primary Wallace tree are completed. A residual computation of a partial multiplication result associated with a final limb of the primary Wallace tree is then performed, to obtain a multiplication result using a secondary Wallace tree, where the final limb stores the partial multiplication result of a last iteration.

    HETEROGENEOUS MULTI-FUNCTIONAL RECONFIGURABLE PROCESSING-IN-MEMORY ARCHITECTURE

    公开(公告)号:US20240329930A1

    公开(公告)日:2024-10-03

    申请号:US18425533

    申请日:2024-01-29

    CPC classification number: G06F7/523 G06F7/50

    Abstract: A processing-in-memory (PIM) system includes a plurality of PIM clusters interconnected by a router in one or more dynamic random-access memory (DRAM) banks. The PIM clusters include one or more multiply and accumulate (MAC) processing elements including a plurality of MAC lookup table cores operatively configured to perform arithmetic logic, and one or more special function (SF) processing elements, wherein the one or more SF processing elements including a plurality of SF lookup table cores operatively configured to perform one or more machine learning activation functions. The MAC lookup tables include a first arithmetic logic unit (ALU) lookup table core type operatively configured to perform addition or multiplication operations, and a second ALU lookup table core type operatively configured to simultaneously perform both addition and multiplication operations. The MAC lookup table cores and SF lookup table cores are configured to perform convolutional neural network acceleration.

    Multi-operational modes of neural engine circuit

    公开(公告)号:US12106206B2

    公开(公告)日:2024-10-01

    申请号:US17148432

    申请日:2021-01-13

    Applicant: Apple Inc.

    CPC classification number: G06N3/063 G06F7/24 G06F7/50 G06F7/523 G06F7/5443

    Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.

    Adaptive settling time control for binary-weighted charge redistribution circuits

    公开(公告)号:US12099569B2

    公开(公告)日:2024-09-24

    申请号:US18337955

    申请日:2023-06-20

    CPC classification number: G06F17/16 G06F7/50 G06F7/523 H03M1/662

    Abstract: A method and circuit for performing vector operations may include, for each sequentially performed operation, operating a switch that corresponds to a current bit-order. Operating the switch may cause a value corresponding to an output of the operation to be stored on a capacitor corresponding to the current bit-order. A time interval during which the switch is operated may be non-uniform with respect to time intervals for other switches, and the time interval may be based at least in part on a settling time of the capacitor. The method may also include performing a bit-order weighted summation of values stored on the plurality of capacitors to generate a result of the vector operation.

    MIXED-PRECISION MULTIPLICATION CIRCUIT
    68.
    发明公开

    公开(公告)号:US20240248683A1

    公开(公告)日:2024-07-25

    申请号:US18101038

    申请日:2023-01-24

    CPC classification number: G06F7/523 G06F7/50

    Abstract: A mixed-precision multiplication circuit that computes according to a second operand and a first operand is provided. The first operand includes an exponent and a mantissa, and the mixed-precision multiplication circuit includes a subset selector and a mantissa multiplier. The subset selector is configured to store the second operand and receive the exponent. The subset selector selects a subset from a plurality of subsets according to the exponent, with the plurality of subsets representing the second operand. The mantissa multiplier is coupled to the subset selector for receiving a multiplicand associated with the selected subset, and configured to receive the mantissa. The mantissa multiplier generates a product by performing a multiplication according to the multiplicand and the mantissa, and the mixed-precision multiplication circuit outputs a result according to the product.

    Optical multiply and accumulate unit

    公开(公告)号:US12039433B2

    公开(公告)日:2024-07-16

    申请号:US17178563

    申请日:2021-02-18

    Abstract: Processing elements for neural network accelerators, and methods of operating the processing elements. Each of a plurality of synapse lanes outputs an electrical signal indicative of a value of a synapse. Each electrical signal is received by a respective optical AND unit including an optical microring resonator that selectively couples an optical signal indicative of the value of an input neuron based at least in part on the received electrical signal. The output of each optical AND unit is provided to either an electrical multiply and accumulate unit, or a respective interferometer of a plurality of interferometers. The interferometers are arranged in series so that optical signals are sequentially summed and shifted by each interferometer. The last interferometer outputs a shifted and accumulated sum of the outputs received from the optical AND units. In either case, the accumulated sum may then be used to generate an output neuron.

Patent Agency Ranking