Patent search ap:("INTEL CORPORATION") AND inv:"Cormac Brick" Page 1

1.

发明申请
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS 有权

公开(公告)号：US20250036928A1

公开(公告)日：2025-01-30

申请号：US18907748

申请日：2024-10-07

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06N3/063 , G06F9/30 , G06N3/04

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

2.

发明申请
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS 有权

公开(公告)号：US20250028565A1

公开(公告)日：2025-01-23

申请号：US18906648

申请日：2024-10-04

Applicant: Intel Corporation

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

3.

发明公开
METHODS AND APPARATUS TO LOAD DATA WITHIN A MACHINE LEARNING ACCELERATOR 审中-公开

公开(公告)号：US20240231839A1

公开(公告)日：2024-07-11

申请号：US18416303

申请日：2024-01-18

Applicant: Intel Corporation

Inventor： Arnab Raha , Deepak Mathaikutty , Debabrata Mohapatra , Sang Kyun Kim , Gautham Chinya , Cormac Brick

IPC: G06F9/445 , G06F9/30 , G06F9/50 , G06N20/00 , H03K19/177 , H03K19/20

CPC classification number: G06F9/445 , G06F9/3001 , G06F9/5027 , G06N20/00 , H03K19/177 , H03K19/20

Abstract: Methods, apparatus, systems, and articles of manufacture to load data into an accelerator are disclosed. An example apparatus includes data provider circuitry to load a first section and an additional amount of compressed machine learning parameter data into a processor engine. Processor engine circuitry executes a machine learning operation using the first section of compressed machine learning parameter data. A compressed local data re-user circuitry determines if a second section is present in the additional amount of compressed machine learning parameter data. The processor engine circuitry executes a machine learning operation using the second section when the second section is present in the additional amount of compressed machine learning parameter data.

4.

发明申请
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS 有权

公开(公告)号：US20220012058A1

公开(公告)日：2022-01-13

申请号：US17484780

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Niall Hanrahan , Martin Power , Kevin Brady , Martin-Thomas Grymel , David Bernard , Gary Baugh , Cormac Brick

IPC: G06F9/30 , G06F9/38 , G06F7/544

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

5.

发明申请
METHODS, APPARATUS, ARTICLES OF MANUFACTURE TO PERFORM ACCELERATED MATRIX MULTIPLICATION 审中-公开

公开(公告)号：US20200226203A1

公开(公告)日：2020-07-16

申请号：US16833210

申请日：2020-03-27

Applicant: Intel Corporation

Inventor： Biji George , Om Ji Omer , Dipan Kumar Mandal , Cormac Brick , Lance Hacking , Sreenivas Subramoney , Belliappa Kuttanna

IPC: G06F17/16

Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.

6.

发明授权
Performance scaling for dataflow deep neural network hardware accelerators 有权

公开(公告)号：US12141683B2

公开(公告)日：2024-11-12

申请号：US17246341

申请日：2021-04-30

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F17/10 , G06F9/30 , G06N3/04 , G06N3/063

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

7.

发明申请
NEURAL NETWORK BASED POWER AND PERFORMANCE MODEL FOR VERSATILE PROCESSING UNITS 有权

公开(公告)号：US20220391710A1

公开(公告)日：2022-12-08

申请号：US17820593

申请日：2022-08-18

Applicant: Intel Corporation

Inventor： Alessandro Palla , Ian Frederick Hunter , Richard Richmond , Cormac Brick , Sebastian Eusebiu Nagy

IPC: G06N3/10 , G06N3/08

Abstract: Systems, apparatuses and methods may provide for technology that determines a complexity of a task associated with a neural network workload and generates a hardware efficiency estimate for the task, wherein the hardware efficiency estimate is generated via a neural network based cost model if the complexity exceeds a threshold, and wherein the hardware efficiency estimate is generated via a cost function if the complexity does not exceed the threshold. In one example, the technology trains the neural network based cost model based on one or more of hardware profile data or register-transfer level (RTL) data.

8.

发明授权
Sparsity-aware datastore for inference processing in deep neural network architectures 有权

公开(公告)号：US12229673B2

公开(公告)日：2025-02-18

申请号：US17524333

申请日：2021-11-11

Applicant: Intel Corporation

Inventor： Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick

IPC: H03M7/00 , G06N3/08 , G06N5/04 , H03M7/30

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

9.

发明授权
Schedule-aware dynamically reconfigurable adder tree architecture for partial sum accumulation in machine learning accelerators 有权

公开(公告)号：US12147836B2

公开(公告)日：2024-11-19

申请号：US17520281

申请日：2021-11-05

Applicant: INTEL CORPORATION

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.

10.

发明授权
Methods and apparatus for dynamic batching of data for neural network workloads 有权

公开(公告)号：US12124941B2

公开(公告)日：2024-10-22

申请号：US16832601

申请日：2020-03-27

Applicant: Intel Corporation

Inventor： Eric Luk , Mohamed Elmalaki , Sara Almalih , Cormac Brick

IPC: G06N3/063 , G06N3/04 , G06N3/08

CPC classification number: G06N3/063 , G06N3/04 , G06N3/08

Abstract: Examples to determine a dynamic batch size of a layer are disclosed herein. An example apparatus to determine a dynamic batch size of a layer includes a layer operations controller to determine a layer ratio between a number of operations of a layer and weights of the layer, a comparator to compare the layer ratio to a number of operations per unit of memory size performed by a computation engine, and a batch size determination controller to, when the layer ratio is less than the number of operations per unit of memory size, determine the dynamic batch size of the layer.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification