-
公开(公告)号:US09934146B2
公开(公告)日:2018-04-03
申请号:US14498946
申请日:2014-09-26
Applicant: INTEL CORPORATION
Inventor: Simon C. Steely, Jr. , Samantika S. Sury , William C. Hasenplaugh
IPC: G06F12/08 , G06F12/0817 , G06F12/0811
CPC classification number: G06F12/0824 , G06F12/0811 , G06F2212/1024 , G06F2212/1048 , G06F2212/2542
Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.
-
公开(公告)号:US09734069B2
公开(公告)日:2017-08-15
申请号:US14567026
申请日:2014-12-11
Applicant: Intel Corporation
Inventor: Simon C. Steely, Jr. , William C. Hasenplaugh , Samantika S. Sury
IPC: G06F12/08 , G06F12/084 , G06F12/0815 , G06F12/0817
CPC classification number: G06F12/084 , G06F12/0815 , G06F12/0822 , G06F2212/1021 , G06F2212/281 , Y02D10/13
Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.
-
公开(公告)号:US11593295B2
公开(公告)日:2023-02-28
申请号:US17550875
申请日:2021-12-14
Applicant: Intel Corporation
Inventor: Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang
Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.
-
公开(公告)号:US10558575B2
公开(公告)日:2020-02-11
申请号:US15396402
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin E. Fleming, Jr. , Kent D. Glossop , Simon C. Steely, Jr. , Jinjie Tang , Alan G. Gara
IPC: G06F12/08 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0842 , G06F12/0875
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
-
公开(公告)号:US10474375B2
公开(公告)日:2019-11-12
申请号:US15396049
申请日:2016-12-30
Applicant: INTEL CORPORATION
Inventor: Kermin Elliott Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop
Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.
-
公开(公告)号:US10469397B2
公开(公告)日:2019-11-05
申请号:US15640540
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr.
IPC: H04L12/721 , H04L12/801 , H04L12/863 , H04L12/935 , H04L12/937
Abstract: Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.
-
公开(公告)号:US09898408B2
公开(公告)日:2018-02-20
申请号:US15088921
申请日:2016-04-01
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.
IPC: G06F12/00 , G06F12/0831 , G06F12/0811 , G06F13/00 , G06F13/28
CPC classification number: G06F12/0831 , G06F12/0811 , G06F2212/283 , G06F2212/621
Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.
-
28.
公开(公告)号:US12050912B2
公开(公告)日:2024-07-30
申请号:US18220225
申请日:2023-07-10
Applicant: Intel Corporation
Inventor: Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.
CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3861 , G06F9/3865
Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
-
公开(公告)号:US11907713B2
公开(公告)日:2024-02-20
申请号:US16729369
申请日:2019-12-28
Applicant: Intel Corporation
Inventor: Kermin E. Chofleming , Chuanjun Zhang , Daniel Towner , Simon C. Steely, Jr. , Benjamin Keen
CPC classification number: G06F9/3001 , G06F9/30181 , G06F15/80
Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a processing element of the plurality of processing elements comprising: a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, operation circuitry coupled to the first plurality of input queues having the multiple bit width, a sign modification circuit coupled to the first plurality of input queues having the multiple bit width, and a configuration register within the processing element to store a configuration value comprising a sign modification field that causes the sign modification circuit to modify a sign bit of a value from the first plurality of input queues according to the sign modification field to create a sign modified value, and the configuration value causes the operation circuitry to perform a selected operation of a plurality of operations on a value from the first plurality of input queues and the sign modified value to create a resultant value, and store the resultant value in the at least one first output queue.
-
公开(公告)号:US11269805B2
公开(公告)日:2022-03-08
申请号:US15980579
申请日:2018-05-15
Applicant: Intel Corporation
Inventor: William J. Butera , Simon C. Steely, Jr. , Richard J. Dischler
IPC: G06F15/80 , G06F9/54 , G06F15/167 , G06F9/448 , G06F15/173 , G06F11/07 , H04L41/0604
Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.
-
-
-
-
-
-
-
-
-