HARDWARE ACCELERATED MACHINE LEARNING
    1.
    发明公开

    公开(公告)号:EP4220380A1

    公开(公告)日:2023-08-02

    申请号:EP23163158.1

    申请日:2017-01-06

    Abstract: Some embodiments of the present disclose relates to an apparatus with a CPU, a bus to couple the CPU to a DRAM; and a machine-learning hardware accelerator coupled to the CPU. The machine-learning accelerator comprises, among others, a plurality of operation units to perform a plurality of parallel MAC operations in accordance with a vector MAC instruction including an operation value indicating a MAC operation, an indication of a first plurality of the real numbers of the first multidimensional array and a second plurality of the real numbers of the second multidimensional array, and permutation information; and circuitry to permute the first plurality of the real numbers of the first multidimensional array in accordance with the permutation information to generate a permuted first plurality of real numbers. Each operation unit comprises: a multiplier to multiply a first real number of the permuted first plurality of real numbers and a corresponding second real number of a second plurality of the real numbers associated with the second multidimensional array to generate a product, and an accumulator to add the product to an accumulation value to generate a result value, the first real number and the second real number each having a first bit width and the accumulation value having a second bit width at least twice the first bit width.

    HARDWARE ACCELERATED MACHINE LEARNING
    2.
    发明公开

    公开(公告)号:EP3974959A1

    公开(公告)日:2022-03-30

    申请号:EP21208402.4

    申请日:2017-01-06

    Abstract: Some embodiments of the present disclose relates to an apparatus with an instruction decoder, a local memory, a circuitry, and a plurality of operational units. The instruction decoder is to decode a matrix multiplication instruction. The local memory comprises a plurality of static random-access memory, SRAM, banks to store at least a portion of a first input tensor and a second input tensor, each of the first and second input tensors comprising a multidimensional array of real numbers. The circuitry is to permute a first plurality of the real numbers associated with the first input tensor in accordance with a permutation pattern included with the matrix multiplication instruction to generate a permuted first plurality of real numbers. The plurality of operational units is to perform a plurality of parallel multiply-accumulate, MAC, operations in accordance with the matrix multiplication instruction, each operational unit comprising: a multiplier to multiply a first real number of the permuted first plurality of real numbers and a second real number of a second plurality of real numbers associated with the second input tensor to generate a product, and an accumulator to add the product to an accumulation value to generate a result value, the first real number and the second real number each having a first bit width and the accumulation value having a second bit width at least twice the first bit width.

Patent Agency Ranking