-
公开(公告)号:US20250061318A1
公开(公告)日:2025-02-20
申请号:US18818154
申请日:2024-08-28
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20230141038A1
公开(公告)日:2023-05-11
申请号:US17960947
申请日:2022-10-06
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20 , G06F5/012 , G06N3/084 , G06N3/044 , G06N3/045
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.
-
公开(公告)号:US20220269931A1
公开(公告)日:2022-08-25
申请号:US17742138
申请日:2022-05-11
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.
-
公开(公告)号:US20180322382A1
公开(公告)日:2018-11-08
申请号:US15869582
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20230409891A1
公开(公告)日:2023-12-21
申请号:US18456272
申请日:2023-08-25
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20 , G06F5/012 , G06N3/084 , G06N3/044 , G06N3/045
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20180314940A1
公开(公告)日:2018-11-01
申请号:US15869515
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: Abhisek KUNDU , NAVEEN MELLEMPUDI , DHEEVATSA MUDIGERE , Dipankar DAS
Abstract: One embodiment provides for a computing device comprising a parallel processor compute unit to perform a set of parallel integer compute operations; a ternarization unit including a weight ternarization circuit and an activation quantization circuit; wherein the weight ternarization circuit is to convert a weight tensor from a floating-point representation to a ternary representation including a ternary weight and a scale factor; wherein the activation quantization circuit is to convert an activation tensor from a floating-point representation to an integer representation; and wherein the parallel processor compute unit includes one or more circuits to perform the set of parallel integer compute operations on the ternary representation of the weight tensor and the integer representation of the activation tensor.
-
公开(公告)号:US20240160931A1
公开(公告)日:2024-05-16
申请号:US18532795
申请日:2023-12-07
Applicant: Intel Corporation
Inventor: Abhisek KUNDU , NAVEEN MELLEMPUDI , DHEEVATSA MUDIGERE , Dipankar DAS
CPC classification number: G06N3/08 , G06F9/46 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06N5/04 , G06T15/005 , G06T17/20
Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US20230087364A1
公开(公告)日:2023-03-23
申请号:US18060414
申请日:2022-11-30
Applicant: Intel Corporation
Inventor: Abhisek KUNDU , NAVEEN MELLEMPUDI , DHEEVATSA MUDIGERE , Dipankar DAS
Abstract: One embodiment provides for a computer-readable medium storing instructions that cause one or more processors to perform operations comprising determining a per-layer scale factor to apply to tensor data associated with layers of a neural network model and converting the tensor data to converted tensor data. The tensor data may be converted from a floating point datatype to a second datatype that is an 8-bit datatype. The instructions further cause the one or more processors to generate an output tensor based on the converted tensor data and the per-layer scale factor.
-
公开(公告)号:US20190354846A1
公开(公告)日:2019-11-21
申请号:US16526376
申请日:2019-07-30
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a 16-bit floating point data type.
-
-
-
-
-
-
-
-