Patent search ap:("Intel Corporation") AND inv:"Mohamed Elmalaki" Page 1

1.

发明申请
ESTIMATION OF POWER PROFILES FOR NEURAL NETWORK MODELS RUNNING ON AI ACCELERATORS 有权

公开(公告)号：US20230004430A1

公开(公告)日：2023-01-05

申请号：US17856968

申请日：2022-07-02

Applicant: Intel Corporation

Inventor： Richard Richmond , Eric Luk , Lingdan Zeng , Lance Hacking , Alessandro Palla , Mohamed Elmalaki , Sara Almalih

IPC: G06F9/48 , G06N3/10

Abstract: Technology for estimating neural network (NN) power profiles includes obtaining a plurality of workloads for a compiled NN model, the plurality of workloads determined for a hardware execution device, determining a hardware efficiency factor for the compiled NN model, and generating, based on the hardware efficiency factor, a power profile for the compiled NN model on one or more of a per-layer basis or a per-workload basis. The hardware efficiency factor can be determined on based on a hardware efficiency measurement and a hardware utilization measurement, and can be determined on a per-workload basis. A configuration file can be provided for generating the power profile, and an output visualization of the power profile can be generated. Further, feedback information can be generated to perform one or more of selecting a hardware device, optimizing a breakdown of workloads, optimizing a scheduling of tasks, or confirming a hardware device design.

2.

发明授权
Apparatuses, methods, and systems for instructions to multiply values of zero 有权

公开(公告)号：US11847450B2

公开(公告)日：2023-12-19

申请号：US16714684

申请日：2019-12-13

Applicant: Intel Corporation

Inventor： Mohamed Elmalaki , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

CPC classification number: G06F9/3001 , G06F9/30145

Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of zero are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first number, a second field that identifies a second number, and a third field that indicates a number format for the first number and the second number; and an execution circuit to execute the decoded single instruction to: cause a first comparison of the first number to a zero value in the number format of the first number, cause a second comparison of the second number to a zero value in the number format of the second number, provide as a resultant of the single instruction a value of zero when the second comparison indicates the second number equals the zero value in the number format of the second number, provide as the resultant of the single instruction the value of zero when the first comparison indicates the first number equals the zero value in the number format of the first number, and provide as the resultant of the single instruction a product of a multiplication of the first number and the second number when the first comparison indicates the first number does not equal the zero value in the number format of the first number and the second comparison indicates the second number does not equal the zero value in the number format of the second number.

3.

发明授权
Apparatuses, methods, and systems for instructions to multiply floating-point values of about one 有权

公开(公告)号：US11650819B2

公开(公告)日：2023-05-16

申请号：US16714656

申请日：2019-12-13

Applicant: Intel Corporation

Inventor： Mohamed Elmalaki , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30105

Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about one threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about one threshold, cause a second comparison of an exponent of the second floating-point number to the about one threshold, provide as a resultant of the single instruction a value of the first floating-point number one when both the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold and the second comparison indicates the exponent of the second floating-point number does not exceed the about one threshold, provide as the resultant of the single instruction the second floating-point number when the first comparison indicates the exponent of the first floating-point number does not exceed the about one threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about one threshold or and the second comparison indicates the exponent of the second floating-point number exceeds the about one threshold.

4.

发明授权
Methods and apparatus for dynamic batching of data for neural network workloads 有权

公开(公告)号：US12124941B2

公开(公告)日：2024-10-22

申请号：US16832601

申请日：2020-03-27

Applicant: Intel Corporation

Inventor： Eric Luk , Mohamed Elmalaki , Sara Almalih , Cormac Brick

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08

Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.

5.

发明申请
METHOD AND SYSTEM OF EVENT-DRIVEN OBJECT SEGMENTATION FOR IMAGE PROCESSING 审中-公开

公开(公告)号：US20200005468A1

公开(公告)日：2020-01-02

申请号：US16565304

申请日：2019-09-09

Applicant: Intel Corporation

Inventor： Somnath Paul , Turbo Majumder , Mohamed Elmalaki , Muhammad Khellah , Charles Augustine

IPC: G06T7/215 , G06K9/00

Abstract: Methods, systems, and articles herein are directed to event-driven object segmentation to track events rather than tracking all pixel locations in an image.

6.

发明申请
METHODS AND APPARATUS FOR DYNAMIC BATCHING OF DATA FOR NEURAL NETWORK WORKLOADS 有权

公开(公告)号：US20250131256A1

公开(公告)日：2025-04-24

申请号：US18888287

申请日：2024-09-18

Applicant: Intel Corporation

Inventor： Eric Luk , Mohamed Elmalaki , Sara Almalih , Cormac Brick

IPC: G06N3/063 , G06N3/04 , G06N3/08

Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.

7.

发明授权
Apparatuses, methods, and systems for instructions to multiply values of one 有权

公开(公告)号：US12153920B2

公开(公告)日：2024-11-26

申请号：US16714680

申请日：2019-12-13

Applicant: Intel Corporation

Inventor： Mohamed Elmalaki , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

Abstract: Systems, methods, and apparatuses relating to instructions to multiply values of one are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first number, a second field that identifies a second number, and a third field that indicates a number format for the first number and the second number; and an execution circuit to execute the decoded single instruction to: cause a first comparison of the first number to a one value in the number format of the first number, cause a second comparison of the second number to a one value in the number format of the second number, provide as a resultant of the single instruction the first number when the second comparison indicates the second number equals the one value in the number format of the second number, provide as the resultant of the single instruction the second number when the first comparison indicates the first number equals the one value in the number format of the first number, and provide as the resultant of the single instruction a product of a multiplication of the first number and the second number when the first comparison indicates the first number does not equal the one value in the number format of the first number and the second comparison indicates the second number does not equal the one value in the number format of the second number.

8.

发明申请
GENERIC LINEAR UNIT HARDWARE ACCELERATOR 有权

公开(公告)号：US20210200539A1

公开(公告)日：2021-07-01

申请号：US16729336

申请日：2019-12-28

Applicant: Intel Corporation

Inventor： Mohamed Elmalaki , ElMoustapha Ould-Ahmed-Vall

IPC: G06F9/30

Abstract: Embodiments of apparatuses, methods, and systems for a generic linear unit hardware accelerator are disclosed. In an embodiment, an apparatus includes a comparator, an exponential subunit, a multiplier subunit, and an adder subunit. The apparatus is to receive an input tensor, a threshold, an exponential enable, a scaling factor, and a bias factor and is to perform a transformation function on the input tensor to generate an output tensor.

9.

发明授权
Apparatuses, methods, and systems for instructions to multiply floating-point values of about zero 有权

公开(公告)号：US11875154B2

公开(公告)日：2024-01-16

申请号：US16714667

申请日：2019-12-13

Applicant: Intel Corporation

Inventor： Mohamed Elmalaki , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30

CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30036 , G06F9/30105

Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about zero are described. In one embodiment, a hardware processor includes a decoder to decode a single instruction into a decoded single instruction, the single instruction having a first field that identifies a first floating-point number, a second field that identifies a second floating-point number, and a third field that indicates an about zero threshold; and an execution circuit to execute the decoded single instruction to: cause a first comparison of an exponent of the first floating-point number to the about zero threshold, cause a second comparison of an exponent of the second floating-point number to the about zero threshold, provide as a resultant of the single instruction a value of zero when the first comparison indicates the exponent of the first floating-point number does not exceed the about zero threshold, provide as the resultant of the single instruction the value of zero when the second comparison indicates the exponent of the second floating-point number does not exceed the about zero threshold, and provide as the resultant of the single instruction a product of a multiplication of the first floating-point number and the second floating-point number when the first comparison indicates the exponent of the first floating-point number exceeds the about zero threshold and the second comparison indicates the exponent of the second floating-point number exceeds the about zero threshold.

10.

发明公开
POWER MANAGEMENT FOR EXECUTION OF MACHINE LEARNING WORKLOADS 审中-公开

公开(公告)号：US20230273832A1

公开(公告)日：2023-08-31

申请号：US18133616

申请日：2023-04-12

Applicant: Intel Corporation

Inventor： Somnath Paul , Muhammad M. Khellah , Linda Zeng , Mohamed Elmalaki

IPC: G06F9/50 , G06F1/3228 , G06F1/3296

CPC classification number: G06F9/505 , G06F1/3228 , G06F1/3296

Abstract: A system for autonomous and proactive power management for energy efficient execution of machine learning workloads may include an apparatus such as system-on-chip (SoC) comprising an accelerator configurable to load and execute a neural network and circuitry to receive a profile of the neural network. The profile may be received from a compiler and include information regarding a plurality of layers of the neural network. Responsive to the profile and the information regarding the plurality of layers, circuitry may adjust, using a local power management unit (PMU) included the apparatus, a power level to the accelerator while the accelerator executes the neural network. The power level adjustment may be based on whether the particular layer is a compute-intensive layer or a memory-intensive layer.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification