METHODS AND APPARATUS FOR PERFORMING A MACHINE LEARNING OPERATION USING STORAGE ELEMENT POINTERS

    公开(公告)号:US20220108135A1

    公开(公告)日:2022-04-07

    申请号:US17554970

    申请日:2021-12-17

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for performing a machine learning operation using storage element pointers. An example computer readable medium comprises instructions that when executed, cause at least one processor to select, in response to a determination that a machine learning operation is to be performed, create first and second storage element pointers based on a type of machine learning operation to be performed, remap input tensor data of the input tensor based on the first storage element pointer without movement of the input tensor data in memory, cause execution of the machine learning operation with the remapped input tensor data to create intermediate tensor data, remap the intermediate tensor data based on the second storage element pointer without movement of the intermediate tensor data in memory, and provide the remapped intermediate tensor data as an output tensor.

    SYSTEMS, APPARATUS, AND METHODS TO DEBUG ACCELERATOR HARDWARE

    公开(公告)号:US20220012164A1

    公开(公告)日:2022-01-13

    申请号:US17483521

    申请日:2021-09-23

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.

    METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

    公开(公告)号:US20220012058A1

    公开(公告)日:2022-01-13

    申请号:US17484780

    申请日:2021-09-24

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

    SPARSITY-BASED REDUCTION OF GATE SWITCHING IN DEEP NEURAL NETWORK ACCELERATORS

    公开(公告)号:US20230325665A1

    公开(公告)日:2023-10-12

    申请号:US18325298

    申请日:2023-05-30

    CPC classification number: G06N3/08 G06N3/0464

    Abstract: Gate switching in deep learning operations can be reduced based on sparsity in the input data. A first element of an activation operand and a first element of a weight operand may be stored in input storage units associated with a multiplier in a processing element. The multiplier computes a product of the two elements, which may be stored in an output storage unit of the multiplier. After detecting that a second element of the activation operand or a second element of the weight operand is zero valued, gate switching is reduced by avoiding at least one gate switching needed for the multiply-accumulation operation. For instance, the input storage units may not be updated. A zero-valued data element may be stored in the output storage unit of the multiplier and used as a product of the second element of the activation operand and the second element of the weight operand.

    SYSTEMS, APPARATUS, AND METHODS TO DEBUG ACCELERATOR HARDWARE

    公开(公告)号:US20240118992A1

    公开(公告)日:2024-04-11

    申请号:US18487490

    申请日:2023-10-16

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to debug a hardware accelerator such as a neural network accelerator for executing Artificial Intelligence computational workloads. An example apparatus includes a core with a core input and a core output to execute executable code based on a machine-learning model to generate a data output based on a data input, and debug circuitry coupled to the core. The debug circuitry is configured to detect a breakpoint associated with the machine-learning model, compile executable code based on at least one of the machine-learning model or the breakpoint. In response to the triggering of the breakpoint, the debug circuitry is to stop the execution of the executable code and output data such as the data input, data output and the breakpoint for debugging the hardware accelerator.

Patent Agency Ranking