-
公开(公告)号:US20240070799A1
公开(公告)日:2024-02-29
申请号:US18461038
申请日:2023-09-05
Applicant: Intel Corporation
Inventor: Dhiraj D. KALAMKAR , Karthikeyan VAIDYANATHAN , Srinivas SRIDHARAN , Dipankar DAS
Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.
-
公开(公告)号:US20190042242A1
公开(公告)日:2019-02-07
申请号:US15940774
申请日:2018-03-29
Applicant: Intel Corporation
Inventor: Dipankar DAS , Naveen K. MELLEPUDI , Mrinmay DUTTA , Arun KUMAR , Dheevatsa MUDIGERE , Abhisek KUNDU
Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
-
公开(公告)号:US20220115362A1
公开(公告)日:2022-04-14
申请号:US17067069
申请日:2020-10-09
Applicant: Intel Corporation
Inventor: Debendra MALLIK , Ravindranath MAHAJAN , Dipankar DAS
Abstract: A processor package module comprises a processor-memory stack including one or more compute die stacked and interconnected with a memory stack on a substrate. One or more photonic die is on the substrate to transmit and receive optical I/O, the one or more photonic die connected to the processor-memory stack and connected to external components through a fiber array. The substrate is mounted into a socket housing, such as a land grid array (LGA) socket. An array of processor package modules are interconnected on a processor substrate via fiber arrays and optical connectors to form a processor chip complex.
-
公开(公告)号:US20190303743A1
公开(公告)日:2019-10-03
申请号:US16317497
申请日:2016-09-27
Applicant: Intel Corporation
Inventor: Swagath VENKATARAMANI , Dipankar DAS , Ashish RANJAN , Subarno BANERJEE , Sasikanth AVANCHA , Ashok JAGANNATHAN , Ajaya V. DURG , Dheemanth NAGARAJ , Bharat KAUL , Anand RAGHUNATHAN
Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.
-
公开(公告)号:US20240160931A1
公开(公告)日:2024-05-16
申请号:US18532795
申请日:2023-12-07
Applicant: Intel Corporation
Inventor: Abhisek KUNDU , NAVEEN MELLEMPUDI , DHEEVATSA MUDIGERE , Dipankar DAS
CPC classification number: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/005 , G06T17/20
Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US20240118892A1
公开(公告)日:2024-04-11
申请号:US18543357
申请日:2023-12-18
Applicant: Intel Corporation
Inventor: Swagath VENKATARAMANI , Dipankar DAS , Ashish RANJAN , Subarno BANERJEE , Sasikanth AVANCHA , Ashok JAGANNATHAN , Ajaya V. DURG , Dheemanth NAGARAJ , Bharat KAUL , Anand RAGHUNATHAN
CPC classification number: G06F9/30145 , G06F9/3004 , G06F9/30043 , G06F9/30087 , G06F9/3834 , G06F9/52 , G06N3/04 , G06N3/063 , G06N3/084
Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.
-
公开(公告)号:US20230087364A1
公开(公告)日:2023-03-23
申请号:US18060414
申请日:2022-11-30
Applicant: Intel Corporation
Inventor: Abhisek KUNDU , NAVEEN MELLEMPUDI , DHEEVATSA MUDIGERE , Dipankar DAS
Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US20210382719A1
公开(公告)日:2021-12-09
申请号:US17410934
申请日:2021-08-24
Applicant: Intel Corporation
Inventor: Swagath VENKATARAMANI , Dipankar DAS , Sasikanth AVANCHA , Ashish RANJAN , Subarno BANERJEE , Bharat KAUL , Anand RAGHUNATHAN
Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.
-
公开(公告)号:US20210072955A1
公开(公告)日:2021-03-11
申请号:US16562979
申请日:2019-09-06
Applicant: Intel Corporation
Inventor: Naveen MELLEMPUDI , Dipankar DAS , Chunhui MEI , Kristopher WONG , Dhiraj D. KALAMKAR , Hong H. JIANG , Subramaniam Maiyuran , Varghese George
Abstract: An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.
-
10.
公开(公告)号:US20240126544A1
公开(公告)日:2024-04-18
申请号:US18399578
申请日:2023-12-28
Applicant: Intel Corporation
Inventor: Dipankar DAS , Naveen K. MELLEMPUDI , Mrinmay DUTTA , Arun KUMAR , Dheevatsa MUDIGERE , Abhisek KUNDU
CPC classification number: G06F9/30014 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30145 , G06F9/3802 , G06F9/382 , G06F9/384 , G06F9/3887 , G06N3/063 , G06F9/30065 , G06F2207/382
Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.
-
-
-
-
-
-
-
-
-