-
公开(公告)号:US12154028B2
公开(公告)日:2024-11-26
申请号:US15869502
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Srinivas Sridharan , Dheevatsa Mudigere
IPC: G06N3/08 , G06F9/54 , G06N3/04 , G06N3/063 , G06N3/084 , G06T15/00 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20
Abstract: One embodiment provides for a system to configure distributed training of a neural network. The system includes memory to store a library to facilitate transmission of data during distributed training of the neural network; a network interface to transmit and receive gradient data associated with the trainable parameters; a general-purpose processor to execute instructions provided by the library, the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and a graphics processor to perform compute operations associated with machine learning framework workflow to generate the gradient data associated with the trainable parameters, wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface.
-
公开(公告)号:US11321805B2
公开(公告)日:2022-05-03
申请号:US17083588
申请日:2020-10-29
Applicant: Intel Corporation
Inventor: Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das , Srinivas Sridharan
IPC: G06T1/20 , G06N3/063 , G06F17/16 , G06F7/523 , G06F5/01 , G06F7/501 , G06F17/15 , G06N3/04 , G06F7/544 , G06N3/08
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
-
公开(公告)号:US10776699B2
公开(公告)日:2020-09-15
申请号:US15869564
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.
-
公开(公告)号:US20240403620A1
公开(公告)日:2024-12-05
申请号:US18679802
申请日:2024-05-31
Applicant: Intel Corporation
Inventor: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US20230053289A1
公开(公告)日:2023-02-16
申请号:US17845794
申请日:2022-06-21
Applicant: Intel Corporation
Inventor: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US20220343174A1
公开(公告)日:2022-10-27
申请号:US17742581
申请日:2022-05-12
Applicant: Intel Corporation
Inventor: Dipankar Das , Roger Gramunt , Mikhail Smelyanskiy , Jesus Corbal , Dheevatsa Mudigere , Naveen K. Mellempudi , Alexander F. Heinecke
Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.
-
公开(公告)号:US11068780B2
公开(公告)日:2021-07-20
申请号:US15476998
申请日:2017-04-01
Applicant: Intel Corporation
Inventor: Naveen K. Mellempudi , Srinivas Sridharan , Dheevatsa Mudigere , Dipankar Das
Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.
-
公开(公告)号:US10528346B2
公开(公告)日:2020-01-07
申请号:US15940774
申请日:2018-03-29
Applicant: Intel Corporation
Inventor: Dipankar Das , Naveen K. Mellempudi , Mrinmay Dutta , Arun Kumar , Dheevatsa Mudigere , Abhisek Kundu
Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
-
公开(公告)号:US11893490B2
公开(公告)日:2024-02-06
申请号:US18060414
申请日:2022-11-30
Applicant: Intel Corporation
Inventor: Abhisek Kundu , Naveen Mellempudi , Dheevatsa Mudigere , Dipankar Das
IPC: G06N3/08 , G06N5/04 , G06T15/00 , G06F9/46 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06T17/20 , G06T15/80 , G06T17/10 , G06T15/04 , G06V10/94
CPC classification number: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/005 , G06T15/04 , G06T15/80 , G06T17/10 , G06T17/20 , G06V10/94
Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US11373088B2
公开(公告)日:2022-06-28
申请号:US15859504
申请日:2017-12-30
Applicant: Intel Corporation
Inventor: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
-
-
-
-
-
-
-
-