-
公开(公告)号:US20200265545A1
公开(公告)日:2020-08-20
申请号:US16853405
申请日:2020-04-20
Applicant: Intel Corporation
Inventor: Naveen MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
IPC: G06T1/20 , G06N3/08 , G06N3/04 , G06F7/544 , G06F17/15 , G06F7/501 , G06F5/01 , G06F7/523 , G06F17/16 , G06N3/063
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
-
公开(公告)号:US20230409891A1
公开(公告)日:2023-12-21
申请号:US18456272
申请日:2023-08-25
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20 , G06F5/012 , G06N3/084 , G06N3/044 , G06N3/045
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20220327656A1
公开(公告)日:2022-10-13
申请号:US17730364
申请日:2022-04-27
Applicant: Intel Corporation
Inventor: Naveen K. MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
IPC: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/544 , G06F17/15 , G06F17/16 , G06N3/04 , G06N3/063 , G06N3/08
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to quantize elements of a floating-point tensor to convert the floating-point tensor into a dynamic fixed-point tensor.
-
公开(公告)号:US20220101480A1
公开(公告)日:2022-03-31
申请号:US17398295
申请日:2021-08-10
Applicant: Intel Corporation
Inventor: DHIRAJ D. KALAMKAR , KARTHIKEYAN VAIDYANATHAN , SRINIVAS SRIDHARAN , DIPANKAR DAS
Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.
-
公开(公告)号:US20210110508A1
公开(公告)日:2021-04-15
申请号:US17083588
申请日:2020-10-29
Applicant: Intel Corporation
Inventor: Naveen MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
IPC: G06T1/20 , G06N3/063 , G06F17/16 , G06F7/523 , G06F5/01 , G06F7/501 , G06F17/15 , G06N3/04 , G06F7/544 , G06N3/08
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
-
公开(公告)号:US20190227797A1
公开(公告)日:2019-07-25
申请号:US15879420
申请日:2018-01-24
Applicant: Intel Corporation
Inventor: ALEXANDER HEINECKE , DIPANKAR DAS , ROBERT VALENTINE , MARK CHARNEY
Abstract: An apparatus and method for performing multiply-accumulate operations. For example, one embodiment of a processor comprises: a decoder to decode instructions; a first source register to store a first plurality of packed words; a second source register to store a second plurality of packed words; a third source register to store a plurality of packed quadwords; execution circuitry to execute a first instruction, the execution circuitry comprising: extension circuitry to sign-extend or zero-extend the first and second plurality of packed words to generate a first and second plurality of doublewords corresponding to the first and second plurality of packed words; multiplier circuitry to multiply each of the first plurality of doublewords with a corresponding one of the second plurality of doublewords to generate a plurality of temporary products; adder circuitry to add at least a first set of the temporary products to generate a first temporary sum; accumulation circuitry to combine the first temporary sum with a first packed quadword value from a first quadword location in the third source register to generate a first accumulated quadword result; a destination register to store the first accumulated quadword result in the first quadword location.
-
公开(公告)号:US20180322607A1
公开(公告)日:2018-11-08
申请号:US15881991
申请日:2018-01-29
Applicant: Intel Corporation
Inventor: Naveen MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
CPC classification number: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F17/153 , G06F17/16 , G06N3/063
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic; a decode unit to decode an instruction for execution by the compute unit, the instruction to cause the compute unit to perform a matrix arithmetic operation on a set of dynamic fixed-point tensors; and a dynamic precision manager to dynamically adjust the precision of a compute operation performed by the compute unit during the matrix arithmetic operation, the dynamic precision manager to adjust the precision of the compute operation to prevent an arithmetic overflow.
-
-
-
-
-
-