ESTIMATION OF POWER PROFILES FOR NEURAL NETWORK MODELS RUNNING ON AI ACCELERATORS

    公开(公告)号:US20230004430A1

    公开(公告)日:2023-01-05

    申请号:US17856968

    申请日:2022-07-02

    Abstract: Technology for estimating neural network (NN) power profiles includes obtaining a plurality of workloads for a compiled NN model, the plurality of workloads determined for a hardware execution device, determining a hardware efficiency factor for the compiled NN model, and generating, based on the hardware efficiency factor, a power profile for the compiled NN model on one or more of a per-layer basis or a per-workload basis. The hardware efficiency factor can be determined on based on a hardware efficiency measurement and a hardware utilization measurement, and can be determined on a per-workload basis. A configuration file can be provided for generating the power profile, and an output visualization of the power profile can be generated. Further, feedback information can be generated to perform one or more of selecting a hardware device, optimizing a breakdown of workloads, optimizing a scheduling of tasks, or confirming a hardware device design.

    Apparatuses, methods, and systems for instructions to multiply values of zero

    公开(公告)号:US11847450B2

    公开(公告)日:2023-12-19

    申请号:US16714684

    申请日:2019-12-13

    CPC classification number: G06F9/3001 G06F9/30145

    Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of zero are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first number, a second field that identifies a second number, and a third field that indicates a number format for the first number and the second number; and an execution circuit to execute the decoded single instruction to: cause a first comparison of the first number to a zero value in the number format of the first number, cause a second comparison of the second number to a zero value in the number format of the second number, provide as a resultant of the single instruction a value of zero when the second comparison indicates the second number equals the zero value in the number format of the second number, provide as the resultant of the single instruction the value of zero when the first comparison indicates the first number equals the zero value in the number format of the first number, and provide as the resultant of the single instruction a product of a multiplication of the first number and the second number when the first comparison indicates the first number does not equal the zero value in the number format of the first number and the second comparison indicates the second number does not equal the zero value in the number format of the second number.

    Apparatuses, methods, and systems for instructions to multiply floating-point values of about one

    公开(公告)号:US11650819B2

    公开(公告)日:2023-05-16

    申请号:US16714656

    申请日:2019-12-13

    Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about one threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about one threshold, cause a second comparison of an exponent of the second floating-point number to the about one threshold, provide as a resultant of the single instruction a value of the first floating-point number one when both the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold and the second comparison indicates the exponent of the second floating-point number does not exceed the about one threshold, provide as the resultant of the single instruction the second floating-point number when the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about one threshold or and the second comparison indicates the exponent of the second floating-point number exceeds the about one threshold.

    METHODS AND APPARATUS FOR DYNAMIC BATCHING OF DATA FOR NEURAL NETWORK WORKLOADS

    公开(公告)号:US20250131256A1

    公开(公告)日:2025-04-24

    申请号:US18888287

    申请日:2024-09-18

    Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.

    Apparatuses, methods, and systems for instructions to multiply values of one

    公开(公告)号:US12153920B2

    公开(公告)日:2024-11-26

    申请号:US16714680

    申请日:2019-12-13

    Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first number, a second field that identifies a second number, and a third field that indicates a number format for the first number and the second number; and an execution circuit to execute the decoded single instruction to: cause a first comparison of the first number to a one value in the number format of the first number, cause a second comparison of the second number to a one value in the number format of the second number, provide as a resultant of the single instruction the first number when the second comparison indicates the second number equals the one value in the number format of the second number, provide as the resultant of the single instruction the second number when the first comparison indicates the first number equals the one value in the number format of the first number, and provide as the resultant of the single instruction a product of a multiplication of the first number and the second number when the first comparison indicates the first number does not equal the one value in the number format of the first number and the second comparison indicates the second number does not equal the one value in the number format of the second number.

    GENERIC LINEAR UNIT HARDWARE ACCELERATOR

    公开(公告)号:US20210200539A1

    公开(公告)日:2021-07-01

    申请号:US16729336

    申请日:2019-12-28

    Abstract: Embodiments of apparatuses, methods, and systems for a generic linear unit hardware accelerator are disclosed. In an embodiment, an apparatus includes a comparator, an exponential subunit, a multiplier subunit, and an adder subunit. The apparatus is to receive an input tensor, a threshold, an exponential enable, a scaling factor, and a bias factor and is to perform a transformation function on the input tensor to generate an output tensor.

    Apparatuses, methods, and systems for instructions to multiply floating-point values of about zero

    公开(公告)号:US11875154B2

    公开(公告)日:2024-01-16

    申请号:US16714667

    申请日:2019-12-13

    Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about zero threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about zero threshold, cause a second comparison of an exponent of the second floating-point number to the about zero threshold, provide as a resultant of the single instruction a value of zero when the first comparison indicates the exponent of the first floating-point number does not exceed the about zero threshold, provide as the resultant of the single instruction the value of zero when the second comparison indicates the exponent of the second floating-point number does not exceed the about zero threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about zero threshold and the second comparison indicates the exponent of the second floating-point number exceeds the about zero threshold.

    POWER MANAGEMENT FOR EXECUTION OF MACHINE LEARNING WORKLOADS

    公开(公告)号:US20230273832A1

    公开(公告)日:2023-08-31

    申请号:US18133616

    申请日:2023-04-12

    CPC classification number: G06F9/505 G06F1/3228 G06F1/3296

    Abstract: A system for autonomous and proactive power management for energy efficient execution of machine learning workloads may include an apparatus such as system-on-chip (SoC) comprising an accelerator configurable to load and execute a neural network and circuitry to receive a profile of the neural network. The profile may be received from a compiler and include information regarding a plurality of layers of the neural network. Responsive to the profile and the information regarding the plurality of layers, circuitry may adjust, using a local power management unit (PMU) included the apparatus, a power level to the accelerator while the accelerator executes the neural network. The power level adjustment may be based on whether the particular layer is a compute-intensive layer or a memory-intensive layer.

Patent Agency Ranking