-
公开(公告)号:US20190004810A1
公开(公告)日:2019-01-03
申请号:US15638120
申请日:2017-06-29
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Jonas Svennebring , Samantika S. Sury , Christopher J. Hughes , Jong Soo Park , Lingxiang Xiang
IPC: G06F9/38 , G06F12/0893 , G06F9/26 , G06F13/28
Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.
-
公开(公告)号:US10146690B2
公开(公告)日:2018-12-04
申请号:US15180351
申请日:2016-06-13
Applicant: Intel Corporation
Inventor: Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.
IPC: G06F12/0831
Abstract: In an embodiment, a processor includes a plurality of cores and synchronization logic. The synchronization logic includes circuitry to: receive a first memory request and a second memory request; determine whether the second memory request is in contention with the first memory request; and in response to a determination that the second memory request is in contention with the first memory request, process the second memory request using a non-blocking cache coherence protocol. Other embodiments are described and claimed.
-
公开(公告)号:US09934146B2
公开(公告)日:2018-04-03
申请号:US14498946
申请日:2014-09-26
Applicant: INTEL CORPORATION
Inventor: Simon C. Steely, Jr. , Samantika S. Sury , William C. Hasenplaugh
IPC: G06F12/08 , G06F12/0817 , G06F12/0811
CPC classification number: G06F12/0824 , G06F12/0811 , G06F2212/1024 , G06F2212/1048 , G06F2212/2542
Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.
-
公开(公告)号:US20170351430A1
公开(公告)日:2017-12-07
申请号:US15170050
申请日:2016-06-01
Applicant: Intel Corporation
Inventor: Robert G. Blankenship , Simon C. Steely, JR. , Samantika S. Sury
IPC: G06F3/06 , G06F12/0808 , G06F12/0815 , G06F12/0811 , G06F12/0842
CPC classification number: G06F3/0605 , G06F3/0625 , G06F3/0659 , G06F3/0673 , G06F12/0808 , G06F12/0811 , G06F12/0815 , G06F12/0824 , G06F12/0826 , G06F12/0831 , G06F12/0842 , G06F2212/1028 , G06F2212/1048 , Y02D10/13
Abstract: Systems, methods, and apparatuses are directed to requesting access to a memory address; storing an identification of the memory address in a data structure; receiving a first request for access to the memory address, the request comprising a reference to a second processor core; storing the reference to the second processor in the data structure; receiving a second request for access to the memory address, the second request comprising a reference to a third processor core; determining, based on the data structure, that the third processor core is different from the second processor core; and responding to the second request without buffering the second request.
-
公开(公告)号:US09734069B2
公开(公告)日:2017-08-15
申请号:US14567026
申请日:2014-12-11
Applicant: Intel Corporation
Inventor: Simon C. Steely, Jr. , William C. Hasenplaugh , Samantika S. Sury
IPC: G06F12/08 , G06F12/084 , G06F12/0815 , G06F12/0817
CPC classification number: G06F12/084 , G06F12/0815 , G06F12/0822 , G06F2212/1021 , G06F2212/281 , Y02D10/13
Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.
-
公开(公告)号:US11989555B2
公开(公告)日:2024-05-21
申请号:US15638120
申请日:2017-06-29
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Jonas Svennebring , Samantika S. Sury , Christopher J. Hughes , Jong Soo Park , Lingxiang Xiang
CPC classification number: G06F9/3004 , G06F9/3001 , G06F9/30185 , G06F9/3836 , G06F9/46 , G06F13/28
Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.
-
公开(公告)号:US11537520B2
公开(公告)日:2022-12-27
申请号:US17494651
申请日:2021-10-05
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Samantika S. Sury , Christopher J. Hughes , Jonas Svennebring , Yen-Cheng Liu , Stephen R. Van Doren , David A. Koufaty
IPC: G06F12/0815 , G06F12/0808 , G06F9/30 , G06F12/0817 , G06F12/0831
Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
-
公开(公告)号:US20220091983A1
公开(公告)日:2022-03-24
申请号:US17494651
申请日:2021-10-05
Applicant: Intel Corporation
Inventor: Doddaballapur N. Jayasimha , Samantika S. Sury , Christopher J. Hughes , Jonas Svennebring , Yen-Cheng Liu , Stephen R. Van Doren , David A. Koufaty
IPC: G06F12/0815 , G06F12/0808 , G06F9/30 , G06F12/0817
Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.
-
公开(公告)号:US20200319886A1
公开(公告)日:2020-10-08
申请号:US16799619
申请日:2020-02-24
Applicant: Intel Corporation
Inventor: Christopher J. Hughes , Joseph Nuzman , Jonas Svennebring , Doddaballapur N. Jayasimha , Samantika S. Sury , David A. Koufaty , Niall D. McDonnell , Yen-Cheng Liu , Stephen R. Van Doren , Stephen J. Robinson
IPC: G06F9/30 , G06F12/0875
Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.
-
公开(公告)号:US10387319B2
公开(公告)日:2019-08-20
申请号:US15640534
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Michael C. Adler , Chiachen Chou , Neal C. Crago , Kermin Fleming , Kent D. Glossop , Aamer Jaleel , Pratik M. Marolia , Simon C. Steely, Jr. , Samantika S. Sury
IPC: G06F12/0802 , G06F15/00 , G06F12/0862 , H03K19/177 , G06F15/78 , G11C8/12 , G06F17/50 , G06F15/80
Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform an operation when an incoming operand set arrives at the plurality of processing elements. The processor also includes a streamer element to prefetch the incoming operand set from two or more levels of a memory system.
-
-
-
-
-
-
-
-
-