Patent search ap:("Intel Corporation") AND inv:"Dheevatsa Mudigere" Page 1

1.

发明授权
Fine-grain compute communication execution for deep learning frameworks via hardware accelerated point-to-point primitives 有权

公开(公告)号：US12154028B2

公开(公告)日：2024-11-26

申请号：US15869502

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Srinivas Sridharan , Dheevatsa Mudigere

IPC: G06N3/08 , G06F9/54 , G06N3/04 , G06N3/063 , G06N3/084 , G06T15/00 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20

Abstract: One embodiment provides for a system to configure distributed training of a neural network. The system includes memory to store a library to facilitate transmission of data during distributed training of the neural network; a network interface to transmit and receive gradient data associated with the trainable parameters; a general-purpose processor to execute instructions provided by the library, the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and a graphics processor to perform compute operations associated with machine learning framework workflow to generate the gradient data associated with the trainable parameters, wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface.

2.

发明授权
Dynamic precision management for integer deep learning primitives 有权

公开(公告)号：US11321805B2

公开(公告)日：2022-05-03

申请号：US17083588

申请日：2020-10-29

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan

IPC: G06T1/20 , G06N3/063 , G06F17/16 , G06F7/523 , G06F5/01 , G06F7/501 , G06F17/15 , G06N3/04 , G06F7/544 , G06N3/08

Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.

3.

发明授权
Optimized compute hardware for machine learning operations 有权

公开(公告)号：US10776699B2

公开(公告)日：2020-09-15

申请号：US15869564

申请日：2018-01-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06F17/16 , G06F9/30 , G06N3/08 , G06N3/063 , G06N3/04

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

4.

发明申请
MACHINE LEARNING ACCELERATOR MECHANISM 有权

公开(公告)号：US20240403620A1

公开(公告)日：2024-12-05

申请号：US18679802

申请日：2024-05-31

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06F7/78 , G06F9/00 , G06N3/084 , G06N20/00 , G06T1/20

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

5.

发明申请
MACHINE LEARNING ACCELERATOR MECHANISM 有权

公开(公告)号：US20230053289A1

公开(公告)日：2023-02-16

申请号：US17845794

申请日：2022-06-21

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06F7/78 , G06N3/08 , G06N20/00 , G06F9/00

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

6.

发明申请
OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS 有权

公开(公告)号：US20220343174A1

公开(公告)日：2022-10-27

申请号：US17742581

申请日：2022-05-12

Applicant: Intel Corporation

Inventor： Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke

IPC: G06N3/08 , G06N3/063 , G06N3/04 , G06F17/16 , G06F9/30 , G06F9/38 , G06F7/544

Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.

7.

发明授权
Technologies for scaling deep learning training 有权

公开(公告)号：US11068780B2

公开(公告)日：2021-07-20

申请号：US15476998

申请日：2017-04-01

Applicant: Intel Corporation

Inventor： Naveen K. Mellempudi , Srinivas Sridharan , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06N3/06 , G06N3/04 , G06N3/063

Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.

8.

发明授权
Instructions for fused multiply-add operations with variable precision input operands 有权

公开(公告)号：US10528346B2

公开(公告)日：2020-01-07

申请号：US15940774

申请日：2018-03-29

Applicant: Intel Corporation

Inventor： Dipankar Das , Naveen K. Mellempudi , Mrinmay Dutta , Arun Kumar , Dheevatsa Mudigere , Abhisek Kundu

IPC: G06F9/30 , G06F7/544 , G06F9/38 , G06F7/483 , G06N3/063

Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.

9.

发明授权
Incremental precision networks using residual inference and fine-grain quantization 有权

公开(公告)号：US11893490B2

公开(公告)日：2024-02-06

申请号：US18060414

申请日：2022-11-30

Applicant: Intel Corporation

Inventor： Abhisek Kundu , Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das

IPC: G06N3/08 , G06N5/04 , G06T15/00 , G06F9/46 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06T17/20 , G06T15/80 , G06T17/10 , G06T15/04 , G06V10/94

CPC classification number: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/005 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20 , G06V10/94

Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.

10.

发明授权
Machine learning accelerator mechanism 有权

公开(公告)号：US11373088B2

公开(公告)日：2022-06-28

申请号：US15859504

申请日：2017-12-30

Applicant: Intel Corporation

Inventor： Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari

IPC: G06N3/063 , G06N20/00 , G06F7/78 , G06N3/08 , G06F9/00 , G06T1/20

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification