-
1.
公开(公告)号:US20230409318A1
公开(公告)日:2023-12-21
申请号:US18240287
申请日:2023-08-30
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
CPC classification number: G06F9/3001 , G06F9/30145 , G06F9/3861 , G06F9/30036 , G06F9/3865
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
2.
公开(公告)号:US20210326131A1
公开(公告)日:2021-10-21
申请号:US17362854
申请日:2021-06-29
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
公开(公告)号:US20210224213A1
公开(公告)日:2021-07-22
申请号:US17206961
申请日:2021-03-19
Applicant: Intel Corporation
Inventor: Swapna RAJ , Samantika S. SURY , Kermin CHOFLEMING , Simon C. STEELY, JR.
IPC: G06F13/40 , G06F13/16 , G06F12/0815
Abstract: Examples include techniques for near data acceleration for a multi-core architecture. A near data processor included in a memory controller of a processor may access data maintained in a memory device coupled with the near data processor via one or more memory channels responsive to a work request to execute a kernel, an application or a loop routine using the accessed data to generate values. The near data processor provides an indication to the requestor of the work request that values have been generated.
-
4.
公开(公告)号:US20250138823A1
公开(公告)日:2025-05-01
申请号:US19004194
申请日:2024-12-27
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
公开(公告)号:US20230409478A1
公开(公告)日:2023-12-21
申请号:US18241458
申请日:2023-09-01
Applicant: Intel Corporation
Inventor: Kermin CHOFLEMING , Yu BAI , Simon C. STEELY, JR.
IPC: G06F12/0811 , G06F12/02
CPC classification number: G06F12/0811 , G06F12/0292 , G06F2212/1021
Abstract: Latency on the miss path to a cache level in a CPU module is reduced by predicting when a cache miss is likely. Main memory is directly accessed in parallel with the access to the cache level in the CPU module based on the prediction that a cache miss is likely in the cache level.
-
6.
公开(公告)号:US20210200540A1
公开(公告)日:2021-07-01
申请号:US16729369
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Kermin E. CHOFLEMING , Chuanjun ZHANG , Daniel TOWNER , Simon C. STEELY, JR. , Benjamin KEEN
IPC: G06F9/30
Abstract: Systems, methods, and apparatuses relating to fused operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a processing element of the plurality of processing elements comprising: a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, operation circuitry coupled to the first plurality of input queues having the multiple bit width, a sign modification circuit coupled to the first plurality of input queues having the multiple bit width, and a configuration register within the processing element to store a configuration value comprising a sign modification field that causes the sign modification circuit to modify a sign bit of a value from the first plurality of input queues according to the sign modification field to create a sign modified value, and the configuration value causes the operation circuitry to perform a selected operation of a plurality of operations on a value from the first plurality of input queues and the sign modified value to create a resultant value, and store the resultant value in the at least one first output queue.
-
7.
公开(公告)号:US20190258481A1
公开(公告)日:2019-08-22
申请号:US16398200
申请日:2019-04-29
Applicant: Intel Corporation
Inventor: Edward T. GROCHOWSKI , Asit K. MISHRA , Robert VALENTINE , Mark J. CHARNEY , Simon C. STEELY, JR.
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
-
-
-
-
-