Patent search ap:("INTEL CORPORATION") AND inv:"Arnab Raha" Page 2

11.

发明申请
MULTI-BUFFERED REGISTER FILES WITH SHARED ACCESS CIRCUITS 有权

公开(公告)号：US20210117197A1

公开(公告)日：2021-04-22

申请号：US17132895

申请日：2020-12-23

Applicant: Intel Corporation

Inventor： Steven Hsu , Amit Agarwal , Debabrata Mohapatra , Arnab Raha , Moongon Jung , Gautham Chinya , Ram Krishnamurthy

IPC: G06F9/30 , G06F13/16 , G06F15/78 , G06N3/04

Abstract: Systems, apparatuses and methods identify a plurality of registers that are associated with a system-on-chip. The plurality of registers includes a first portion dedicated to write operations and a second portion dedicated to read operations. The technology writes data to the first portion of the plurality of registers, and transfers the data from the first portion to the second portion.

12.

发明授权
Methods and apparatus to load data within a machine learning accelerator 有权

公开(公告)号：US12242861B2

公开(公告)日：2025-03-04

申请号：US18416303

申请日：2024-01-18

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick

IPC: G06F9/445 , G06F9/30 , G06F9/50 , G06N20/00 , H03K19/177 , H03K19/20

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

13.

发明公开
FLOATING-POINT MULTIPLY-ACCUMULATE UNIT FACILITATING VARIABLE DATA PRECISIONS 审中-公开

公开(公告)号：US20230376274A1

公开(公告)日：2023-11-23

申请号：US18362529

申请日：2023-07-31

Applicant: Intel Corporation

Inventor： Mark Anders , Arnab Raha , Amit Agarwal , Steven Hsu , Deepak Abraham Mathaikutty , Ram K. Krishnamurthy , Martin Power

IPC: G06F7/544 , G06F7/487 , G06F7/485 , G06F5/01

CPC classification number: G06F7/5443 , G06F7/4876 , G06F7/485 , G06F5/012

Abstract: A fused dot-product multiply-accumulate (MAC) circuit may support variable precisions of floating-point data elements to perform computations (e.g., MAC operations) in deep learning operations. An operation mode of the circuit may be selected based on the precision of an input element. The operation mode may be a FP16 mode or a FP8 mode. In the FP8 mode, product exponents may be computed based on exponents of floating-point input elements. A maximum exponent may be selected from the one or more product exponents. A global maximum exponent may be selected from a plurality of maximum exponents. A product mantissa may be computed and aligned with another product mantissa based on a difference between the global maximum exponent and a corresponding maximum exponent. An adder tree may accumulate the aligned product mantissas and compute a partial sum mantissa. The partial sum mantissa may be normalized using the global maximum exponent.

14.

发明申请
DEEP NEURAL NETWORK (DNN) ACCELERATOR FACILITATING QUANTIZED INFERENCE 有权

公开(公告)号：US20230059976A1

公开(公告)日：2023-02-23

申请号：US18047415

申请日：2022-10-18

Applicant: Intel Corporation

Inventor： Deepak Abraham Mathaikutty , Arnab Raha , Raymond Jit-Hung Sung , Martin Power , Umer Iftikhar Cheema , David Thomas Bernard

IPC: G06N3/08

Abstract: An DNN accelerator may include a PE array performing MAC operations. The PE array may include PEs capable of MAC operations on quantized values. A PE may include subtractors for subtracting zeropoints from quantized activations and quantized weights to generate intermediate activations and intermediate weights. The intermediate activations and intermediate weights may be stored in data storage units in the PE and maybe used by an MAC unit in the PE. The subtractors may be placed outside the MAC unit but inside the PE. The MAC unit may perform sequential cycles of MAC operations. The MAC unit may include a plurality of multipliers. The intermediate activations and intermediate weights stored in the data storage units may be reused by different multipliers in different cycles of MAC operations. An output of the MAC unit or of the PE may be multiplied with a quantization scale to produce a floating-point value.

15.

发明申请
SYSTEM AND METHOD FOR CHANNEL-SEPARABLE OPERATIONS IN DEEP NEURAL NETWORKS 有权

公开(公告)号：US20220261623A1

公开(公告)日：2022-08-18

申请号：US17733692

申请日：2022-04-29

Applicant: Intel Corporation

Inventor： Raymond Jit-Hung Sung , Debabrata Mohapatra , Arnab Raha , Deepak Abraham Mathaikutty , Praveen Kumar Gupta

IPC: G06N3/063 , G06F15/80 , G06F7/544 , G06F7/523 , G06F7/50

Abstract: An DNN accelerator includes a column of PEs and an external adder assembly for performing depthwise convolution. Each PE includes register files, multipliers, and an internal adder assembly. Each register file can store an operand (input operand, weight operand, etc.) of the depthwise convolution. The operand includes a sequence of elements, each of which corresponds to a different depthwise channel. A multiplier can perform a sequence of multiplications on two operands, e.g., an input operand and a weight operand, and generate a product operand. The internal adder assembly can accumulate product operands and generate an output operand of the PE. The output operand includes output elements, each of which corresponds to a different depthwise channel. The operands may be reused in different rounds of operations by the multipliers. The external adder assembly can accumulate output operands of multiple PEs and generate an output operand of the PE column.

16.

发明申请
SYSTEM AND METHOD FOR BALANCING SPARSITY IN WEIGHTS FOR ACCELERATING DEEP NEURAL NETWORKS 有权

公开(公告)号：US20220083843A1

公开(公告)日：2022-03-17

申请号：US17534976

申请日：2021-11-24

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Deepak Abraham Mathaikutty , Raymond Jit-Hung Sung , Cormac Michael Brick

IPC: G06N3/04 , G06F7/76

Abstract: An apparatus is provided to access a weight vector of a layer in a sequence of layers in the DNN. The weight vector includes a first sequence of weights having different values. A bitmap is generated based on the weight vector. The bitmap includes a second sequence of bitmap elements. Each bitmap element corresponds to a different weight and has a value determined based at least on the value of the corresponding weight. The index of each bitmap element in the second sequence matches the index of the corresponding weight in the first sequence. A new bitmap is generated by rearranging the bitmap elements in the second sequence based on the values of the bitmap elements. The weight vector is rearranged based on the new bitmap. The rearranged weight vector is divided into subsets, each of which is assigned to a different PE for a MAC operation.

17.

发明申请
SPARSITY-AWARE DATASTORE FOR INFERENCE PROCESSING IN DEEP NEURAL NETWORK ARCHITECTURES 有权

公开(公告)号：US20220067524A1

公开(公告)日：2022-03-03

申请号：US17524333

申请日：2021-11-11

Applicant: Intel Corporation

Inventor： Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick

IPC: G06N3/08 , G06N5/04

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

18.

发明申请
ACCELERATED LOADING OF UNSTRUCTURED SPARSE DATA IN MACHINE LEARNING ARCHITECTURES 有权

公开(公告)号：US20210042617A1

公开(公告)日：2021-02-11

申请号：US17081509

申请日：2020-10-27

Applicant: Intel Corporation

Inventor： Gautham Chinya , Deepak Mathaikutty , Guruguhanathan Venkataramanan , Debabrata Mohapatra , Moongon Jung , Sang Kyun Kim , Arnab Raha , Cormac Brick

IPC: G06N3/063 , G06N3/04

Abstract: Systems, apparatuses and methods may provide for technology that identify an assignment of weights of a workload to a plurality of processing elements, where the workload is to be associated with a neural network. The technology generates a representation that is to represent whether each of the weights is a zero value or a non-zero value. The technology further stores the representation into partitions of a storage structure based on the assignment of the weights, where the partitions are each to be dedicated to a different one of the processing elements.

19.

发明授权
Lightweight trusted execution for internet-of-things devices 审中-公开

公开(公告)号：US10671744B2

公开(公告)日：2020-06-02

申请号：US15190396

申请日：2016-06-23

Applicant: INTEL CORPORATION

Inventor： Li Zhao , Manoj R. Sastry , Arnab Raha

IPC: G06F21/62

Abstract: Lightweight trusted execution technologies for internet-of-things devices are described. In response to a memory request at a page unit from an application executing in a current domain, the page unit is to map a current virtual address (VA) to a current physical address (PA). The policy enforcement logic (PEL) reads, from a secure domain cache (SDC), a domain value (DID) and a VA value that correspond to the current PA. The PEL grants access when the current domain and the DID correspond to the unprotected region or the current domain and the DID correspond to the secure domain region, the current domain is equal to the DID, and the current VA is equal to the VA value. The PEL grants data access and denies code access when the current domain corresponds to the secure domain region and the DID corresponds to the unprotected region.

20.

发明公开
Schedule-Aware Tensor Distribution Module 审中-公开

公开(公告)号：US20240220785A1

公开(公告)日：2024-07-04

申请号：US18408716

申请日：2024-01-10

Applicant: Intel Corporation

Inventor： Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking

IPC: G06N3/063 , G06F9/38 , G06F9/448 , G06F9/50 , G06N5/04

CPC classification number: G06N3/063 , G06F9/3814 , G06F9/3877 , G06F9/4498 , G06F9/5027 , G06N5/04

Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification