Discovery of hardware characteristics of deep learning accelerators for optimization via compiler

    公开(公告)号:US12118460B2

    公开(公告)日:2024-10-15

    申请号:US17092033

    申请日:2020-11-06

    CPC classification number: G06N3/08 G06F8/41 G06N3/04

    Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A computing device running a compiler can interact and/or probe an integrated circuit device to identify hardware characteristics of the integrated circuit device in performing matrix computations. The compiler can generate and optimize a result of compilation from a description of an artificial neural network based at least in part on the hardware characteristics of the integrated circuit device. The result of compilation can include first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.

    Deep neural networks compiler for a trace-based accelerator

    公开(公告)号:US11861337B2

    公开(公告)日:2024-01-02

    申请号:US17003476

    申请日:2020-08-26

    CPC classification number: G06F8/458 G06F9/30087 G06F9/5027 G06N3/02

    Abstract: A method of compiling neural network code to executable instructions for execution by a computational acceleration system having a memory circuit and one or more acceleration circuits having a maps data buffer and a kernel data buffer is disclosed, such as for execution by an inference engine circuit architecture which includes a matrix-matrix (MM) accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative compiling method includes generating a list of neural network layer model objects; fusing available functions and layers in the list; selecting a cooperative mode, an independent mode, or a combined cooperative and independent mode for execution; selecting a data movement mode and an ordering of computations which reduces usage of the memory circuit; generating an ordered sequence of load objects, compute objects, and store objects; and converting the ordered sequence of load objects, compute objects, and store objects into the executable instructions.

    DEEP LEARNING ACCELERATORS WITH CONFIGURABLE HARDWARE OPTIONS OPTIMIZABLE VIA COMPILER

    公开(公告)号:US20220147809A1

    公开(公告)日:2022-05-12

    申请号:US17092023

    申请日:2020-11-06

    Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A compiler can convert a description of an artificial neural network into a compiler output through optimization and/or selection of hardware options of the integrated circuit device. The compiler output can include parameters of the artificial neural network, instructions executable by processing units of the Deep Learning Accelerator to generate an output of the artificial neural network responsive to an input to the artificial neural network, and hardware options to be stored in registers connected to control hardware configurations of the processing units.

    DISCOVERY OF HARDWARE CHARACTERISTICS OF DEEP LEARNING ACCELERATORS FOR OPTIMIZATION VIA COMPILER

    公开(公告)号:US20250036950A1

    公开(公告)日:2025-01-30

    申请号:US18912182

    申请日:2024-10-10

    Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A computing device running a compiler can interact and/or probe an integrated circuit device to identify hardware characteristics of the integrated circuit device in performing matrix computations. The compiler can generate and optimize a result of compilation from a description of an artificial neural network based at least in part on the hardware characteristics of the integrated circuit device. The result of compilation can include first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.

    DISCOVERY OF HARDWARE CHARACTERISTICS OF DEEP LEARNING ACCELERATORS FOR OPTIMIZATION VIA COMPILER

    公开(公告)号:US20220147810A1

    公开(公告)日:2022-05-12

    申请号:US17092033

    申请日:2020-11-06

    Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. For example, an integrated circuit device may be configured to execute instructions with matrix operands and configured with random access memory. A computing device running a compiler can interact and/or probe an integrated circuit device to identify hardware characteristics of the integrated circuit device in performing matrix computations. The compiler can generate and optimize a result of compilation from a description of an artificial neural network based at least in part on the hardware characteristics of the integrated circuit device. The result of compilation can include first data representative of parameters of the artificial neural network and second data representative of instructions executable by the integrated circuit device to generate an output of the artificial neural network based on the first data and an input to the artificial neural network.

    Deep Neural Networks Compiler for a Trace-Based Accelerator

    公开(公告)号:US20220066760A1

    公开(公告)日:2022-03-03

    申请号:US17003476

    申请日:2020-08-26

    Abstract: A method of compiling neural network code to executable instructions for execution by a computational acceleration system having a memory circuit and one or more acceleration circuits having a maps data buffer and a kernel data buffer is disclosed, such as for execution by an inference engine circuit architecture which includes a matrix-matrix (MM) accelerator circuit having multiple operating modes to provide a complete matrix multiplication. A representative compiling method includes generating a list of neural network layer model objects; fusing available functions and layers in the list; selecting a cooperative mode, an independent mode, or a combined cooperative and independent mode for execution; selecting a data movement mode and an ordering of computations which reduces usage of the memory circuit; generating an ordered sequence of load objects, compute objects, and store objects; and converting the ordered sequence of load objects, compute objects, and store objects into the executable instructions.

Patent Agency Ranking