Apparatuses and methods to accelerate matrix multiplication

    公开(公告)号:US12254061B2

    公开(公告)日:2025-03-18

    申请号:US17256195

    申请日:2018-09-27

    Abstract: Methods and apparatuses relating to performing vector multiplication are described. Hardware accelerators to perform vector multiplication are also described. In one embodiment, a combined fixed-point and floating-point vector multiplication circuit includes at least one switch to change the circuit between a first mode and a second mode, where in the first mode, each multiplier of a set of multipliers is to multiply mantissas from a same element position of a first floating-point vector and a second floating-point vector to produce a corresponding product, shift the corresponding products with a set of shift registers based on a maximum exponent of exponents for the corresponding products determined by a maximum exponent determiner to produce shifted products, perform an numeric conversion operation on the shifted products with a set of numeric conversion circuits based on sign bits from the same element position of the first floating-point vector and the second floating-point vector to produce signed representations of the shifted products, add the signed representations of the shifted products with a set of adders to produce a single product, and normalize the single product with a normalization circuit based on the maximum exponent into a single floating-point resultant, and in the second mode, each multiplier of the set of multipliers is to multiply values from a same element position of a first integer vector and a second integer vector to produce a corresponding product, and add each corresponding product with the set of adders to produce a single integer resultant.

    ARTIFICIAL NEURAL NETWORK TRAINING USING FLEXIBLE FLOATING POINT TENSORS

    公开(公告)号:US20190042944A1

    公开(公告)日:2019-02-07

    申请号:US16004243

    申请日:2018-06-08

    Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

    APPARATUS AND METHOD FOR COHERENT, ACCELERATED CONVERSION BETWEEN DATA REPRESENTATIONS

    公开(公告)号:US20190042094A1

    公开(公告)日:2019-02-07

    申请号:US16024812

    申请日:2018-06-30

    Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

    Artificial neural network training using flexible floating point tensors

    公开(公告)号:US12205035B2

    公开(公告)日:2025-01-21

    申请号:US16004243

    申请日:2018-06-08

    Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

    ARTIFICIAL NEURAL NETWORK TRAINING USING FLEXIBLE FLOATING POINT TENSORS

    公开(公告)号:US20240028905A1

    公开(公告)日:2024-01-25

    申请号:US18478554

    申请日:2023-09-29

    CPC classification number: G06N3/084 G06N3/063 G06N3/045 G06F9/3013

    Abstract: Thus, the present disclosure is directed to systems and methods for training neural networks using a tensor that includes a plurality of FP16 values and a plurality of bits that define an exponent shared by some or all of the FP16 values included in the tensor. The FP16 values may include IEEE 754 format 16-bit floating point values and the tensor may include a plurality of bits defining the shared exponent. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa and a variable bit-length exponent that may be dynamically set by processor circuitry. The tensor may include a shared exponent and FP16 values that include a variable bit-length mantissa; a variable bit-length exponent that may be dynamically set by processor circuitry; and a shared exponent switch set by the processor circuitry to selectively combine the FP16 value exponent with the shared exponent.

    Apparatus and method for coherent, accelerated conversion between data representations

    公开(公告)号:US10761757B2

    公开(公告)日:2020-09-01

    申请号:US16024812

    申请日:2018-06-30

    Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Patent Agency Ranking