-
公开(公告)号:US10467183B2
公开(公告)日:2019-11-05
申请号:US15640538
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop
Abstract: Methods and apparatuses relating to pipelined runtime services in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; a first configuration controller coupled to a first subset of the processing elements; and a second configuration controller coupled to a second, different subset of the processing elements, the first configuration controller and the second configuration controller are to configure the first subset and the second, different subset according to configuration information for a first context, and, for a context switch, the first configuration controller is to configure the first subset according to configuration information for a second context after pending operations of the first context are completed in the first subset and block second context dataflow into the second, different subset's input from the first subset's output until pending operations of the first context are completed in the second, different subset.
-
公开(公告)号:US10430252B2
公开(公告)日:2019-10-01
申请号:US16192322
申请日:2018-11-15
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.
IPC: G06F12/00 , G06F9/52 , G06F12/0817
Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
-
13.
公开(公告)号:US10402168B2
公开(公告)日:2019-09-03
申请号:US15283295
申请日:2016-10-01
Applicant: Intel Corporation
Abstract: A floating point multiply-add unit having inputs coupled to receive a floating point multiplier data element, a floating point multiplicand data element, and a floating point addend data element. The multiply-add unit including a mantissa multiplier to multiply a mantissa of the multiplier data element and a mantissa of the multiplicand data element to calculate a mantissa product. The mantissa multiplier including a most significant bit portion to calculate most significant bits of the mantissa product, and a least significant bit portion to calculate least significant bits of the mantissa product. The mantissa multiplier has a plurality of different possible sizes of the least significant bit portion. Energy consumption reduction logic to selectively reduce energy consumption of the least significant bit portion, but not the most significant bit portion, to cause the least significant bit portion to not calculate the least significant bits of the mantissa product.
-
14.
公开(公告)号:US11698787B2
公开(公告)日:2023-07-11
申请号:US17362854
申请日:2021-06-29
Applicant: Intel Corporation
Inventor: Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.
CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3861 , G06F9/3865
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
公开(公告)号:US11656662B2
公开(公告)日:2023-05-23
申请号:US17174106
申请日:2021-02-11
Applicant: Intel Corporation
Inventor: Simon C. Steely, Jr. , Richard Dischler , David Bach , Olivier Franza , William J. Butera , Christian Karl , Benjamin Keen , Brian Leung
IPC: G06F1/18 , H01L23/538 , G06F15/76 , H01L25/065 , G06F9/50
CPC classification number: G06F1/183 , G06F9/5027 , G06F15/76 , H01L23/5384 , H01L23/5385 , H01L23/5386 , H01L25/0657
Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
-
公开(公告)号:US11200186B2
公开(公告)日:2021-12-14
申请号:US16024854
申请日:2018-06-30
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang
Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
-
公开(公告)号:US10963022B2
公开(公告)日:2021-03-30
申请号:US16862263
申请日:2020-04-29
Applicant: Intel Corporation
Inventor: Simon C. Steely, Jr. , Richard Dischler , David Bach , Olivier Franza , William J. Butera , Christian Karl , Benjamin Keen , Brian Leung
IPC: H05K1/18 , G06F1/18 , H01L23/538 , G06F9/50 , G06F15/76 , H01L25/065
Abstract: Embodiments herein may present an integrated circuit or a computing system having an integrated circuit, where the integrated circuit includes a physical network layer, a physical computing layer, and a physical memory layer, each having a set of dies, and a die including multiple tiles. The physical network layer further includes one or more signal pathways dynamically configurable between multiple pre-defined interconnect topologies for the multiple tiles, where each topology of the multiple pre-defined interconnect topologies corresponds to a communication pattern related to a workload. At least a tile in the physical computing layer is further arranged to move data to another tile in the physical computing layer or a storage cell of the physical memory layer through the one or more signal pathways in the physical network layer. Other embodiments may be described and/or claimed.
-
公开(公告)号:US10445234B2
公开(公告)日:2019-10-15
申请号:US15640533
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr. , Samantika S. Sury
IPC: G06F12/0802 , H03K19/177 , G06F17/50 , G11C7/10 , G06F15/78 , G06F15/80 , G11C8/12
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.
-
公开(公告)号:US10416999B2
公开(公告)日:2019-09-17
申请号:US15396395
申请日:2016-12-30
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr.
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform a second operation by a respective, incoming operand set arriving at each of the dataflow operators of the plurality of processing elements.
-
公开(公告)号:US10146690B2
公开(公告)日:2018-12-04
申请号:US15180351
申请日:2016-06-13
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.
IPC: G06F12/0831
Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-