-
公开(公告)号:US20190354846A1
公开(公告)日:2019-11-21
申请号:US16526376
申请日:2019-07-30
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20240412318A1
公开(公告)日:2024-12-12
申请号:US18751799
申请日:2024-06-24
Applicant: Intel Corporation
Inventor: Naveen K. MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
IPC: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/544 , G06F17/15 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to convert elements of a floating-point tensor to convert the floating-point tensor into a fixed-point tensor.
-
公开(公告)号:US20230141038A1
公开(公告)日:2023-05-11
申请号:US17960947
申请日:2022-10-06
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20 , G06F5/012 , G06N3/084 , G06N3/044 , G06N3/045
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a first floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a second floating point data type.
-
公开(公告)号:US20220269931A1
公开(公告)日:2022-08-25
申请号:US17742138
申请日:2022-05-11
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.
-
公开(公告)号:US20180322382A1
公开(公告)日:2018-11-08
申请号:US15869582
申请日:2018-01-12
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
CPC classification number: G06N3/063 , G06F7/487 , G06F7/5443 , G06T1/20
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20180293493A1
公开(公告)日:2018-10-11
申请号:US15482953
申请日:2017-04-10
Applicant: Intel Corporation
Inventor: Dhiraj D. Kalamkar , KARTHIKEYAN VAIDYANATHAN , SRINIVAS SRIDHARAN , DIPANKAR DAS
Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.
-
公开(公告)号:US20180293492A1
公开(公告)日:2018-10-11
申请号:US15482925
申请日:2017-04-10
Applicant: Intel Corporation
Inventor: Dhiraj D. Kalamkar , KARTHIKEYAN VAIDYANATHAN , SRINIVAS SRIDHARAN , DIPANKAR DAS
IPC: G06N3/08
Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.
-
公开(公告)号:US20250061318A1
公开(公告)日:2025-02-20
申请号:US18818154
申请日:2024-08-28
Applicant: Intel Corporation
Inventor: NAVEEN MELLEMPUDI , DIPANKAR DAS
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
-
公开(公告)号:US20230351542A1
公开(公告)日:2023-11-02
申请号:US18306033
申请日:2023-04-24
Applicant: Intel Corporation
Inventor: Naveen K. MELLEMPUDI , DHEEVATSA MUDIGERE , DIPANKAR DAS , SRINIVAS SRIDHARAN
IPC: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/544 , G06F17/15 , G06F17/16 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045
CPC classification number: G06T1/20 , G06F5/01 , G06F7/501 , G06F7/523 , G06F7/5443 , G06F17/153 , G06F17/16 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06F2207/382 , G06F2207/4824
Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising a hardware processing unit having a dynamic precision fixed-point unit that is configurable to convert elements of a floating-point tensor to convert the floating-point tensor into a fixed-point tensor.
-
公开(公告)号:US20210350212A1
公开(公告)日:2021-11-11
申请号:US17328028
申请日:2021-05-24
Applicant: Intel Corporation
Inventor: DHIRAJ D. KALAMKAR , KARTHIKEYAN VAIDYANATHAN , SRINIVAS SRIDHARAN , DIPANKAR DAS
Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.
-
-
-
-
-
-
-
-
-