Patent search ap:("INTEL CORPORATION") AND inv:"Arnab Raha" Page 4

31.

发明公开
APPROXIMATING ACTIVATION FUNCTION IN NEURAL NETWORK WITH LOOK-UP TABLE HAVING HYBRID ARCHITECTURE 审中-公开

公开(公告)号：US20240160695A1

公开(公告)日：2024-05-16

申请号：US18392618

申请日：2023-12-21

Applicant: Intel Corporation

Inventor： Dinakar Kondru , Deepak Abraham Mathaikutty , Arnab Raha , Umer Iftikhar Cheema

IPC: G06F17/17 , G06F1/035

CPC classification number: G06F17/17 , G06F1/0356

Abstract: A non-linear activation function may be approximated by linear functions. The input range of the activation function may be divided into input segments. One or more input segments may be selected based on statistical analysis of input data elements in the input range. A parameter of a first linear function that approximates the activation function for at least part of a selected input segment may be stored in a first portion of a first look-up table (LUT). The first portion of the first LUT is dedicated to a first group of post processing engines (PPEs). A parameter of a second linear function that approximates the activation function for at least part of an unselected input segment may be stored in a shared pool of LUT entries, which includes a second portion of the first LUT and a portion of a second LUT and is shared by multiple groups of PPEs.

32.

发明授权
Methods and apparatus to load data within a machine learning accelerator 有权

公开(公告)号：US11922178B2

公开(公告)日：2024-03-05

申请号：US17359392

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick

IPC: G06F9/445 , G06F9/30 , G06F9/50 , G06N20/00 , H03K19/177 , H03K19/20

CPC classification number: G06F9/445 , G06F9/3001 , G06F9/5027 , G06N20/00 , H03K19/177 , H03K19/20

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

33.

发明公开
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS 审中-公开

公开(公告)号：US20240022259A1

公开(公告)日：2024-01-18

申请号：US18465495

申请日：2023-09-12

Applicant: Intel Corporation

Inventor： Gautham Chinya , Debabrata Mohapatra , Arnab Raha , Huichu Liu , Cormac Brick

IPC: H03M7/30 , G06F16/22 , G06N3/063

CPC classification number: H03M7/3082 , G06F16/2237 , G06N3/063 , G06N3/08

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

34.

发明公开
PRUNING ACTIVATIONS AND WEIGHTS OF NEURAL NETWORKS WITH PROGRAMMABLE THRESHOLDS 审中-公开

公开(公告)号：US20230394312A1

公开(公告)日：2023-12-07

申请号：US18453715

申请日：2023-08-22

Applicant: Intel Corporation

Inventor： Soumendu Kumar Ghosh , Shamik Kundu , Arnab Raha , Deepak Abraham Mathaikutty

IPC: G06N3/082 , G06N3/0464

CPC classification number: G06N3/082 , G06N3/0464

Abstract: Activations (e.g., output activations) or weights of intermediate layers of deep neural networks (DNNs) can be pruned to increase sparsity and reduce the amount of computation required for performing the computations in the layers or subsequent layers. A pruning threshold may be determined, e.g., through an iterative process, and activations or weights having absolute values lower than the pruning threshold may be changed to zero. A first pruning threshold may be used to prune an output tensor or kernel of a layer. The loss in the accuracy of the DNN due to the pruning may be determined. A second pruning threshold may be determined based on the first pruning threshold and the accuracy loss. The DNN may be modified by adding a pruning operation to the layer. The pruning operation can prune output tensors or kernels of the layer based on the second pruning threshold.

35.

发明公开
SPARSITY-BASED REDUCTION OF GATE SWITCHING IN DEEP NEURAL NETWORK ACCELERATORS 审中-公开

公开(公告)号：US20230325665A1

公开(公告)日：2023-10-12

申请号：US18325298

申请日：2023-05-30

Applicant: Intel Corporation

Inventor： Martin Langhammer , Arnab Raha , Martin Power

IPC: G06N3/08 , G06N3/0464

CPC classification number: G06N3/08 , G06N3/0464

Abstract: Gate switching in deep learning operations can be reduced based on sparsity in the input data. A first element of an activation operand and a first element of a weight operand may be stored in input storage units associated with a multiplier in a processing element. The multiplier computes a product of the two elements, which may be stored in an output storage unit of the multiplier. After detecting that a second element of the activation operand or a second element of the weight operand is zero valued, gate switching is reduced by avoiding at least one gate switching needed for the multiply-accumulation operation. For instance, the input storage units may not be updated. A zero-valued data element may be stored in the output storage unit of the multiplier and used as a product of the second element of the activation operand and the second element of the weight operand.

36.

发明公开
HYBRID MULTIPY-ACCUMULATION OPERATION WITH COMPRESSED WEIGHTS 审中-公开

公开(公告)号：US20230229917A1

公开(公告)日：2023-07-20

申请号：US18184101

申请日：2023-03-15

Applicant: Intel Corporation

Inventor： Michael Wu , Arnab Raha , Deepak Abraham Mathaikutty , Nihat Tunali , Martin Langhammer

IPC: G06N3/08 , G06F7/544

CPC classification number: G06N3/08 , G06F7/5443

Abstract: A compute block can perform hybrid multiply-accumulate (MAC) operations. The compute block may include a weight compressing module and a processing element (PE) array. The weight compression module may select a first group of one or more weights and a second group of one or more weights from a weight tensor of a DNN (deep neural network) layer. A weight in the first group is quantized to a power of two value. A weight in the second group is quantized to an integer. The integer and the exponent of the power of two value may be stored in a memory in lieu of the original values of the weights. A PE in the PE array includes a shifter configured to shift an activation of the layer by the exponent of the power of two value and a multiplier configured to multiplying the integer with another activation of the layer.

37.

发明申请
ACCELERATING DATA LOAD AND COMPUTATION IN FRONTEND CONVOLUTIONAL LAYER 有权

公开(公告)号：US20230073661A1

公开(公告)日：2023-03-09

申请号：US18055315

申请日：2022-11-14

Applicant: Intel Corporation

Inventor： Deepak Abraham Mathaikutty , Arnab Raha , Umer Iftikhar Cheema , Raymond Jit-Hung Sung

IPC: G06N3/04 , G06F17/16 , G06N3/08

Abstract: An DNN (deep neural network) accelerator may accelerate deep learning, such as convolutions in frontend layers through a scheduler for loading data to be processed. The DNN accelerator may store, in a memory, an input tensor of a convolutional layer in a DNN. The convolutional layer may be the first layer or a layer that is arranged before the one or more other convolutional layers in the DNN such that data processed by the first layer can be efficiently reused across data load rounds. The input tensor includes one or more channels. A channel includes activations arranged in rows and columns. The DNN accelerator may read at least a portion of the input tensor from the memory into a datastore. The datastore includes some databanks. The DNN accelerator may provide a vector of one or more activations to a processing element for operations such as multiplications on the vector.

38.

发明申请
METHODS AND APPARATUS TO PERFORM LOW OVERHEAD SPARSITY ACCELERATION LOGIC FOR MULTI-PRECISION DATAFLOW IN DEEP NEURAL NETWORK ACCELERATORS 有权

公开(公告)号：US20220292366A1

公开(公告)日：2022-09-15

申请号：US17709337

申请日：2022-03-30

Applicant: Intel Corporation

Inventor： Arnab Raha , Martin Langhammer , Debabrata Mohapatra , Nihat Tunali , Michael Wu

IPC: G06N3/10 , G06N3/04

Abstract: Methods, apparatus, systems, and articles of manufacture to perform low overhead sparsity acceleration logic for multi-precision dataflow in deep neural network accelerators are disclosed. An example apparatus includes a first buffer to store data corresponding to a first precision; a second buffer to store data corresponding to a second precision; and hardware control circuitry to: process a first multibit bitmap to determine an activation precision of an activation value, the first multibit bitmap including values corresponding to different precisions; process a second multibit bitmap to determine a weight precision of a weight value, the second multibit bitmap including values corresponding to different precisions; and store the activation value and the weight value in the second buffer when at least one of the activation precision or the weight precision corresponds to the second precision.

39.

发明申请
RUNTIME CONFIGURABLE REGISTER FILES FOR ARTIFICIAL INTELLIGENCE WORKLOADS 有权

公开(公告)号：US20220075659A1

公开(公告)日：2022-03-10

申请号：US17530156

申请日：2021-11-18

Applicant: Intel Corporation

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Cormac Michael Brick

IPC: G06F9/50 , G06F9/30 , G06F7/544 , G06N3/04

Abstract: There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification