Patent search ap:("INTEL CORPORATION") AND inv:"Raymond Sung" Page 1

1.

发明申请
SPARSITY-AWARE DATASTORE FOR INFERENCE PROCESSING IN DEEP NEURAL NETWORK ARCHITECTURES 有权

公开(公告)号：US20220067524A1

公开(公告)日：2022-03-03

申请号：US17524333

申请日：2021-11-11

Applicant: Intel Corporation

Inventor： Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick

IPC: G06N3/08 , G06N5/04

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

2.

发明申请
PERFORMANCE SCALING FOR DATAFLOW DEEP NEURAL NETWORK HARDWARE ACCELERATORS 有权

公开(公告)号：US20250036928A1

公开(公告)日：2025-01-30

申请号：US18907748

申请日：2024-10-07

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06N3/063 , G06F9/30 , G06N3/04

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

3.

发明申请
SCHEDULE-AWARE DYNAMICALLY RECONFIGURABLE ADDER TREE ARCHITECTURE FOR PARTIAL SUM ACCUMULATION IN MACHINE LEARNING ACCELERATORS 有权

公开(公告)号：US20250028565A1

公开(公告)日：2025-01-23

申请号：US18906648

申请日：2024-10-04

Applicant: Intel Corporation

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. The present disclosure provides a schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators, wherein the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator. Other embodiments may be described and/or claimed.

4.

发明授权
Sparsity-aware datastore for inference processing in deep neural network architectures 有权

公开(公告)号：US12229673B2

公开(公告)日：2025-02-18

申请号：US17524333

申请日：2021-11-11

Applicant: Intel Corporation

Inventor： Deepak Mathaikutty , Arnab Raha , Raymond Sung , Debabrata Mohapatra , Cormac Brick

IPC: H03M7/00 , G06N3/08 , G06N5/04 , H03M7/30

Abstract: Systems, apparatuses and methods may provide for technology that prefetches compressed data and a sparsity bitmap from a memory to store the compressed data in a decode buffer, where the compressed data is associated with a plurality of tensors, wherein the compressed data is in a compressed format. The technology aligns the compressed data with the sparsity bitmap to generate decoded data, and provides the decoded data to a plurality of processing elements.

5.

发明授权
Schedule-aware dynamically reconfigurable adder tree architecture for partial sum accumulation in machine learning accelerators 有权

公开(公告)号：US12147836B2

公开(公告)日：2024-11-19

申请号：US17520281

申请日：2021-11-05

Applicant: INTEL CORPORATION

Inventor： Debabrata Mohapatra , Arnab Raha , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F9/50 , G06F7/50 , G06F9/48 , G06F15/80 , G06F15/82 , G06N20/00

Abstract: Techniques and configurations enhancing the performance of hardware (HW) accelerators are provided. A schedule-aware, dynamically reconfigurable, tree-based partial sum accumulator architecture for HW accelerators is provided, where the depth of an adder tree in the HW accelerator is dynamically based on a dataflow schedule generated by a compiler. The adder tree depth is adjusted on a per-layer basis at runtime. Configuration registers, programmed via software, dynamically alter the adder tree depth for partial sum accumulation based on the dataflow schedule. By facilitating a variable depth adder tree during runtime, the compiler can choose a compute optimal dataflow schedule that minimizes the number of compute cycles needed to accumulate partial sums across multiple processing elements (PEs) within a PE array of a HW accelerator.

6.

发明授权
Performance scaling for dataflow deep neural network hardware accelerators 有权

公开(公告)号：US12141683B2

公开(公告)日：2024-11-12

申请号：US17246341

申请日：2021-04-30

Applicant: Intel Corporation

Inventor： Arnab Raha , Debabrata Mohapatra , Gautham Chinya , Guruguhanathan Venkataramanan , Sang Kyun Kim , Deepak Mathaikutty , Raymond Sung , Cormac Brick

IPC: G06F17/10 , G06F9/30 , G06N3/04 , G06N3/063

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.

Patent Agency Ranking