-
公开(公告)号:US12056788B2
公开(公告)日:2024-08-06
申请号:US17684187
申请日:2022-03-01
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman
IPC: G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T15/00 , G09G5/36 , G06T15/04
CPC classification number: G06T1/20 , G06F3/14 , G06F9/3001 , G06F9/30014 , G06F9/3017 , G06F9/3887 , G06F9/3895 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/084 , G06T15/005 , G09G5/363 , G06F9/3851 , G06T15/04 , G09G2360/06 , G09G2360/08 , G09G2360/121
Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.
-
2.
公开(公告)号:US12039331B2
公开(公告)日:2024-07-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06N20/00 , G06T15/00
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F1/16 , G06F9/30025 , G06F9/3013 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
公开(公告)号:US20240004829A1
公开(公告)日:2024-01-04
申请号:US18350902
申请日:2023-07-12
Applicant: Intel Corporation
Inventor: Altug Koker , Farshad Akhbari , Feng Chen , Dukhwan Kim , Narayan Srinivasa , Nadathur Rajagopalan Satish , Liwei Ma , Jeremy Bottleson , Eriko Nurvitadhi , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu
IPC: G06F15/80 , G06F13/40 , G06T1/20 , G06F9/30 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
CPC classification number: G06F15/8007 , G06F13/4027 , G06T1/20 , G06F9/3004 , G06F13/00 , G06N3/063 , G06N3/084 , G06N3/044 , G06N3/045 , G06N3/048
Abstract: An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.
-
4.
公开(公告)号:US11720355B2
公开(公告)日:2023-08-08
申请号:US17834482
申请日:2022-06-07
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N3/08 , G06N3/044 , G06N3/045 , G06T15/00 , G06N20/00 , G06F17/16
CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/3013 , G06F9/30025 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005
Abstract: One embodiment provides a graphics processor comprising a memory controller and a graphics processing resource coupled with the memory controller. The graphics processing resource includes circuitry configured to execute an instruction to perform a matrix operation on first input including weight data and second input including input activation data, generate intermediate data based on a result of the matrix operation, quantize the intermediate data to a floating-point format determined based on a statistical distribution of first output data, and output, as second output data, quantized intermediate data in a determined floating-point format.
-
5.
公开(公告)号:US20230046506A1
公开(公告)日:2023-02-16
申请号:US17967283
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F7/483 , G06N3/063 , G06N3/04 , G06F9/38 , G06N3/08 , G09G5/393 , G06F7/544 , G06T15/00 , G06N20/00 , G06F17/16
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
-
6.
公开(公告)号:US11169799B2
公开(公告)日:2021-11-09
申请号:US16432402
申请日:2019-06-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
IPC: G06F9/30 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/063 , G06N20/00 , G09G5/393 , G06N3/04 , G06N3/08 , G06T15/00
Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
-
7.
公开(公告)号:US20210182058A1
公开(公告)日:2021-06-17
申请号:US17169232
申请日:2021-02-05
Applicant: Intel Corporation
Inventor: Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar
Abstract: A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.
-
公开(公告)号:US10242423B2
公开(公告)日:2019-03-26
申请号:US15789565
申请日:2017-10-20
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
Abstract: One embodiment provides an accelerator module comprising a memory stack including multiple memory dies; a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers, the GPU including a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction; the at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions; and the floating-point operation is a two-dimensional matrix multiply and accumulate operation.
-
公开(公告)号:US20180299841A1
公开(公告)日:2018-10-18
申请号:US15489142
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. MacPherson , John C. Weast , Justin E. Gottschlich , Jingyi Jin , Barath Lakshmanan , Chandrasekaran Sakthivel , Michael S. Strickland , Joydeep Ray , Kamal Sinha , Prasoonkumar Surti , Balaji Vembu , Ping T. Tang , Anbang Yao , Tatiana Shpeisman , Xiaoming Chen , Vasanth Ranganathan , Sanjeev S. Jahagirdar
Abstract: Methods and apparatus relating to autonomous vehicle neural network optimization techniques are described. In an embodiment, the difference between a first training dataset to be used for a neural network and a second training dataset to be used for the neural network is detected. The second training dataset is authenticated in response to the detection of the difference. The neural network is used to assist in an autonomous vehicle/driving. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230061670A1
公开(公告)日:2023-03-02
申请号:US17978573
申请日:2022-11-01
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Sara S. Baghsorkhi , Anbang Yao , Kevin Nealis , Xiaoming Chen , Altug Koker , Abhishek R. Appu , John C. Weast , Mike B. Macpherson , Dukhwan Kim , Linda L. Hurd , Ben J. Ashbaugh , Barath Lakshmanan , Liwei Ma , Joydeep Ray , Ping T. Tang , Michael S. Strickland
IPC: G06T1/20 , G06N3/08 , G06F9/38 , G06N3/063 , G06F9/30 , G06N20/00 , G06N3/04 , G06F9/50 , G06F7/483
Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
-
-
-
-
-
-
-
-
-