Patent search ap:("INTEL CORPORATION") AND inv:"Arnab Raha" Page 1

1.

发明申请
NEURAL NETWORK ACCELERATOR PERFORMING OPERATION WITH MIXED-FORMAT WEIGHTS 有权

公开(公告)号：US20250060940A1

公开(公告)日：2025-02-20

申请号：US18931973

申请日：2024-10-30

Applicant: Intel Corporation

Inventor： Arnab Raha , Michael Wu , Deepak Abraham Mathaikutty , Daksha Sharma , Martin Langhammer

IPC: G06F7/544 , G06F5/01

Abstract: A data processing unit may include a memory, processing elements (PEs), and a control unit. The memory may store weight blocks within a weight tensor of a neural network operation. Each weight block has an input channel (IC) dimension and an output channel (OC) dimension and includes subblocks. A subblock includes one or more weights having a first data precision and one or more other weights having a second data precision. The second data precision is lower than the first data precision. The control unit may distribute different ones of the subblocks to different ones of the PEs. A PE may receive a subblock and perform a first MAC operation on a weight having a first data precision and a second MAC operation on a weight having a second data precision. The first MAC operation may consume more computation cycles or more multipliers than the second MAC operation.

2.

发明申请
APPROXIMATING ACTIVATION FUNCTIONS IN NEURAL NETWORKS WITH PROGRAMMABLE LOOK-UP TABLE 有权

公开(公告)号：US20240403616A1

公开(公告)日：2024-12-05

申请号：US18500229

申请日：2023-11-02

Applicant: Intel Corporation

Inventor： Umer Iftikhar Cheema , Kevin Brady , Robert Simofi , Colm O Faolain , Deepak Abraham Mathaikutty , Arnab Raha , Dinakar Kondru , Gary Baugh , Darren Crews , Fergal Connor

IPC: G06N3/048

Abstract: An activation function in a neural network may be approximated by one or more linear functions. A linear function may correspond to a segment of the input range of the activation function, e.g., a linear segment. A programmable look-up table may store slopes and intercepts of linear functions. A post processing engine (PPE) array executing the activation function may determine that an input data element of the activation function falls into the linear segment and compute an output of the linear function using the input data element. The output of the linear function may be used as the approximated output of the activation function. Alternatively, the PPE array may determine that the input data element is in a saturation segment and use a fixed value associated with the saturation segment as the approximated output of the activation function.

3.

发明授权
Performance scaling for dataflow deep neural network hardware accelerators 有权

公开(公告)号：US12141683B2

公开(公告)日：2024-11-12

申请号：US17246341

申请日：2021-04-30

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F17/10 , G06F9/30 , G06N3/04 , G06N3/063

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

4.

发明公开
OUTPUT DRAIN PATH FACILITATING FLEXIBLE SCHEDULE-BASED DEEP NEURAL NETWORK ACCELERATOR 审中-公开

公开(公告)号：US20240013040A1

公开(公告)日：2024-01-11

申请号：US18474464

申请日：2023-09-26

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Abraham Mathaikutty , Umer Iftikhar Cheema , Dinakar Kondru

IPC: G06N3/063 , G06N3/048 , G06N3/0464

CPC classification number: G06N3/063 , G06N3/048 , G06N3/0464

Abstract: A drain module may drain activations in an output tensor of a convolution from a PE array that performs the convolution. The drain module may extract activations generated in a collection of PE columns. The activations generated in the PE columns in the collection may be concatenated, e.g., activations generated in the first PE column of the collection may be followed by activations generated in the second PE column of the collection, and so on. The activations in the output tensor may be rearranged into activation vectors. Each activation vector may include activations in different output channels of the deep learning operation. The activations in each activation vector may have the same (X, Y) coordinate in the output tensor. The drain module may determine a memory address for an activation based on the activation's (X, Y, Z) coordinate in the output tensor and write the activation to the memory address.

5.

发明公开
SCHEDULING COMPUTATIONS IN DEEP NEURAL NETWORK BASED ON SPARSITY 审中-公开

公开(公告)号：US20230229507A1

公开(公告)日：2023-07-20

申请号：US18180415

申请日：2023-03-08

Applicant: Intel Corporation

Inventor： Raymond Jit-Hung Sung , Arnab Raha , Deepak Abraham Mathaikutty , Umer Iftikhar Cheema

IPC: G06F9/50 , G06N3/04

CPC classification number: G06F9/5027 , G06N3/04 , H04L41/16

Abstract: Computations in processing elements (PEs) for executing a deep neural network are scheduled via a computation scheduler based on sparsity in input data of the computations to reduce voltage droops. Each PE may compute an input operand and a weight operand in a computation. The computation scheduler may predict the workload of the PE for the computation based on a combined sparsity bitmap, which may be generated based on a sparsity bitmap of the input operand and a sparsity bitmap of the weight operand. The computation scheduler can schedule the starts of the computations in the PEs based on the predicted workloads of the PEs. The computation scheduler may instruct the PE having the highest workload to start the computation first and instruct the other PEs to start computations later. In some embodiments, the computations in the PEs may end in the same clock cycle.

6.

发明申请
DATA REUSE IN DEEP LEARNING 有权

公开(公告)号：US20220188638A1

公开(公告)日：2022-06-16

申请号：US17684764

申请日：2022-03-02

Applicant: Intel Corporation

Inventor： Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Debabrata Mohapatra

IPC: G06N3/08 , G06N3/063 , G06F7/544

Abstract: An apparatus for convolution operations is provided. The apparatus includes a PE array, a datastore, writing modules, reading modules, and a controlling module. The PE array performs MAC operations. The datastore includes databanks, each of which stores data to be used by a column of the PE array. The writing modules transfer data from a memory to the datastore. The reading modules transfer data from the datastore to the PE array. Each reading module may transfer data to a particular column of the PE array. The controlling module can determine the rounds of a convolution operation. Each round includes MAC operations based on a weight. The controlling module controls the writing modules and reading modules so that the same data in a databank can be reused in multiple rounds. For different rounds, the controlling module can provide a reading module accesses to different databanks.

7.

发明申请
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS 有权

公开(公告)号：US20250036928A1

公开(公告)日：2025-01-30

申请号：US18907748

申请日：2024-10-07

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06N3/063 , G06F9/30 , G06N3/04

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

8.

发明申请
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS 有权

公开(公告)号：US20250028565A1

公开(公告)日：2025-01-23

申请号：US18906648

申请日：2024-10-04

Applicant: Intel Corporation

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

9.

发明公开
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR 审中-公开

公开(公告)号：US20240231839A1

公开(公告)日：2024-07-11

申请号：US18416303

申请日：2024-01-18

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick

IPC: G06F9/445 , G06F9/30 , G06F9/50 , G06N20/00 , H03K19/177 , H03K19/20

CPC classification number: G06F9/445 , G06F9/3001 , G06F9/5027 , G06N20/00 , H03K19/177 , H03K19/20

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

10.

发明申请
Techniques For Increasing Activation Sparsity In Artificial Neural Networks 有权

公开(公告)号：US20230021396A1

公开(公告)日：2023-01-26

申请号：US17953637

申请日：2022-09-27

Applicant: Intel Corporation

Inventor： Nihat Tunali , Arnab Raha , Bogdan Pasca , Martin Langhammer , Michael Wu , Deepak Mathaikutty

IPC: G06N3/04 , G06N3/08

Abstract: A method for implementing an artificial neural network in a computing system that comprises performing a compute operation using an input activation and a weight to generate an output activation, and modifying the output activation using a noise value to increase activation sparsity.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification