Patent search ap:("INTEL CORPORATION") AND inv:"Arnab Raha" Page 3

21.

发明公开
ACCURACY-BASED APPROXIMATION OF ACTIVATION FUNCTIONS WITH PROGRAMMABLE LOOK-UP TABLE HAVING AREA BUDGET 审中-公开

公开(公告)号：US20240111830A1

公开(公告)日：2024-04-04

申请号：US18534035

申请日：2023-12-08

Applicant: Intel Corporation

Inventor： Umer Iftikhar Cheema , Robert Simofi , Deepak Abraham Mathaikutty , Arnab Raha , Dinakar Kondru

IPC: G06F17/17 , G06F1/03

CPC classification number: G06F17/17 , G06F1/0307

Abstract: A non-linear activation function in a neural network may be approximated by one or more linear functions. The input range may be divided into input segments, each of which corresponds to a different exponent in the input range of the activation function and includes input data elements having the exponent. Target accuracies may be assigned to the identified exponents based on a statistics analysis of the input data elements. The target accuracy of an input segment will be used to determine one or more linear functions that approximate the activation function for the input segment. An error of an approximation of the activation function by a linear function for the input segment may be within the target accuracy. The parameters of the linear functions may be stored in a look-up table (LUT). During the execution of the DNN, the LUT may be used to execute the activation function.

22.

发明授权
Schedule-aware tensor distribution module 有权

公开(公告)号：US11907827B2

公开(公告)日：2024-02-20

申请号：US16456707

申请日：2019-06-28

Applicant: Intel Corporation

Inventor： Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking

IPC: G06N3/063 , G06N5/04 , G06F9/448 , G06F9/38 , G06F9/50

CPC classification number: G06N3/063 , G06F9/3814 , G06F9/3877 , G06F9/4498 , G06F9/5027 , G06N5/04

Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

23.

发明授权
Methods, systems, articles of manufacture, and apparatus to decode zero-value-compression data vectors 有权

公开(公告)号：US11804851B2

公开(公告)日：2023-10-31

申请号：US16832804

申请日：2020-03-27

Applicant: Intel Corporation

Inventor： Gautham Chinya , Debabrata Mohapatra , Arnab Raha , Huichu Liu , Cormac Brick

IPC: H03M7/30 , G06F16/22 , G06N3/063 , G06N3/08 , G06N3/04

CPC classification number: H03M7/3082 , G06F16/2237 , G06N3/063 , G06N3/04 , G06N3/08

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

24.

发明申请
FLOATING POINT MULTIPLY-ACCUMULATE UNIT FOR DEEP LEARNING 有权

公开(公告)号：US20220188075A1

公开(公告)日：2022-06-16

申请号：US17688131

申请日：2022-03-07

Applicant: Intel Corporation

Inventor： Arnab Raha , Mark A. Anders , Raymond Jit-Hung Sung , Debabrata Mohapatra , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Himanshu Kaul

IPC: G06F7/544 , G06F7/483 , G06N3/08

Abstract: A FPMAC operation has two operands: an input operand and a weight operand. The operands may have a format of FP16, BF16, or INT8. Each operand is split into two portions. The two portions are stored in separate storage units. Then operands are transferred to register files of a PE, with each register file storing bits of an operand sequentially. The PE performs the FPMAC operation based on the operands. The PE may include an FPMAC unit configured to compute an individual partial sum of the PE. The PE may also include an FP adder to accumulate the individual partial sum with other data, such as an output from another PE or an output form another PE array. The FP adder may be fused with the FPMAC unit in a single circuit that can do speculative alignment and has separate critical paths for alignment and normalization.

25.

发明申请
AREA AND ENERGY EFFICIENT MULTI-PRECISION MULTIPLY-ACCUMULATE UNIT-BASED PROCESSOR 有权

公开(公告)号：US20210397414A1

公开(公告)日：2021-12-23

申请号：US17358868

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Arnab Raha , Mark A. Anders , Martin Power , Martin Langhammer , Himanshu Kaul , Debabrata Mohapatra , Gautham Chinya , Cormac Brick , Ram Krishnamurthy

IPC: G06F7/544 , G06F7/527 , G06F5/01

Abstract: Systems, apparatuses and methods may provide for multi-precision multiply-accumulate (MAC) technology that includes a plurality of arithmetic blocks, wherein the plurality of arithmetic blocks each contain multiple multipliers, and wherein the logic is to combine multipliers one or more of within each arithmetic block or across multiple arithmetic blocks. In one example, one or more intermediate multipliers are of a size that is less than precisions supported by arithmetic blocks containing the one or more intermediate multipliers.

26.

发明申请
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR 有权

公开(公告)号：US20210326144A1

公开(公告)日：2021-10-21

申请号：US17359392

申请日：2021-06-25

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick

IPC: G06F9/445 , G06F9/50 , G06F9/30 , G06N20/00 , H03K19/177 , H03K19/20

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

27.

发明申请
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE, AND APPARATUS TO DECODE ZERO-VALUE-COMPRESSION DATA VECTORS 审中-公开

公开(公告)号：US20200228137A1

公开(公告)日：2020-07-16

申请号：US16832804

申请日：2020-03-27

Applicant: Intel Corporation

Inventor： Gautham Chinya , Debabrata Mohapatra , Arnab Raha , Huichu Liu , Cormac Brick

IPC: H03M7/30 , G06F16/22 , G06N3/063

Abstract: Methods, systems, articles of manufacture, and apparatus are disclosed to decode zero-value-compression data vectors. An example apparatus includes: a buffer monitor to monitor a buffer for a header including a value indicative of compressed data; a data controller to, when the buffer includes compressed data, determine a first value of a sparse select signal based on (1) a select signal and (2) a first position in a sparsity bitmap, the first value of the sparse select signal corresponding to a processing element that is to process a portion of the compressed data; and a write controller to, when the buffer includes compressed data, determine a second value of a write enable signal based on (1) the select signal and (2) a second position in the sparsity bitmap, the second value of the write enable signal corresponding to the processing element that is to process the portion of the compressed data.

28.

发明申请
LIGHTWEIGHT TRUSTED EXECUTION FOR INTERNET-OF-THINGS DEVICES 审中-公开

公开(公告)号：US20170372088A1

公开(公告)日：2017-12-28

申请号：US15190396

申请日：2016-06-23

Applicant: INTEL CORPORATION

Inventor： Li Zhao , Manoj R. Sastry , Arnab Raha

IPC: G06F21/62

Abstract: Lightweight trusted execution technologies for internet-of-things devices are described. In response to a memory request at a page unit from an application executing in a current domain, the page unit is to map a current virtual address (VA) to a current physical address (PA). The policy enforcement logic (PEL) reads, from a secure domain cache (SDC), a domain value (DID) and a VA value that correspond to the current PA. The PEL grants access when the current domain and the DID correspond to the unprotected region or the current domain and the DID correspond to the secure domain region, the current domain is equal to the DID, and the current VA is equal to the VA value. The PEL grants data access and denies code access when the current domain corresponds to the secure domain region and the DID corresponds to the unprotected region.

29.

发明授权
Sparsity-aware datastore for inference processing in deep neural network architectures 有权

公开(公告)号：US12229673B2

公开(公告)日：2025-02-18

申请号：US17524333

申请日：2021-11-11

Applicant: Intel Corporation

Inventor： Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick

IPC: H03M7/00 , G06N3/08 , G06N5/04 , H03M7/30

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

30.

发明授权
Schedule-aware dynamically reconfigurable adder tree architecture for partial sum accumulation in machine learning accelerators 有权

公开(公告)号：US12147836B2

公开(公告)日：2024-11-19

申请号：US17520281

申请日：2021-11-05

Applicant: INTEL CORPORATION

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification