Patent search ap:("INTEL CORPORATION") AND inv:"Simon C. Steely Page Jr."

21.

发明授权
Hardware apparatuses and methods to control cache line coherency 有权

公开(公告)号：US09934146B2

公开(公告)日：2018-04-03

申请号：US14498946

申请日：2014-09-26

Applicant: INTEL CORPORATION

Inventor： Simon C. Steely, Jr. , Samantika S. Sury , William C. Hasenplaugh

IPC: G06F12/08 , G06F12/0817 , G06F12/0811

CPC classification number: G06F12/0824 , G06F12/0811 , G06F2212/1024 , G06F2212/1048 , G06F2212/2542

Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

22.

发明授权
Multicast tree-based data distribution in distributed shared cache 有权

公开(公告)号：US09734069B2

公开(公告)日：2017-08-15

申请号：US14567026

申请日：2014-12-11

Applicant: Intel Corporation

Inventor： Simon C. Steely, Jr. , William C. Hasenplaugh , Samantika S. Sury

IPC: G06F12/08 , G06F12/084 , G06F12/0815 , G06F12/0817

CPC classification number: G06F12/084 , G06F12/0815 , G06F12/0822 , G06F2212/1021 , G06F2212/281 , Y02D10/13

Abstract: Systems and methods for multicast tree-based data distribution in a distributed shared cache. An example processing system comprises: a plurality of processing cores, each processing core communicatively coupled to a cache; a tag directory associated with caches of the plurality of processing cores; a shared cache associated with the tag directory; a processing logic configured, responsive to receiving an invalidate request with respect to a certain cache entry, to: allocate, within the shared cache, a shared cache entry corresponding to the certain cache entry; transmit, to at least one of: a tag directory or a processing core that last accessed the certain entry, an update read request with respect to the certain cache entry; and responsive to receiving an update of the certain cache entry, broadcast the update to at least one of: one or more tag directories or one or more processing cores identified by a tag corresponding to the certain cache entry.

23.

发明授权
Apparatuses, methods, and systems for operations in a configurable spatial accelerator 有权

公开(公告)号：US11593295B2

公开(公告)日：2023-02-28

申请号：US17550875

申请日：2021-12-14

Applicant: Intel Corporation

Inventor： Kermin E. Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop , Mitchell Diamond , Benjamin Keen , Dennis Bradford , Fabrizio Petrini , Barry Tannenbaum , Yongzhi Zhang

IPC: G06F13/40 , G06F9/30 , G06F15/78

Abstract: Systems, methods, and apparatuses relating to operations in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator includes a first processing element that includes a configuration register within the first processing element to store a configuration value that causes the first processing element to perform an operation according to the configuration value, a plurality of input queues, an input controller to control enqueue and dequeue of values into the plurality of input queues according to the configuration value, a plurality of output queues, and an output controller to control enqueue and dequeue of values into the plurality of output queues according to the configuration value.

24.

发明授权
Processors, methods, and systems with a configurable spatial accelerator 有权

公开(公告)号：US10558575B2

公开(公告)日：2020-02-11

申请号：US15396402

申请日：2016-12-30

Applicant: INTEL CORPORATION

Inventor： Kermin E. Fleming, Jr. , Kent D. Glossop , Simon C. Steely, Jr. , Jinjie Tang , Alan G. Gara

IPC: G06F12/08 , G06F9/30 , G06F9/38 , G06F12/0862 , G06F12/0842 , G06F12/0875

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.

25.

发明授权
Runtime address disambiguation in acceleration hardware 有权

公开(公告)号：US10474375B2

公开(公告)日：2019-11-12

申请号：US15396049

申请日：2016-12-30

Applicant: INTEL CORPORATION

Inventor： Kermin Elliott Fleming, Jr. , Simon C. Steely, Jr. , Kent D. Glossop

IPC: G06F3/06 , G06F12/00 , G06F13/16 , G06F9/38 , G06F9/30

Abstract: An integrated circuit includes a processor to execute instructions and to interact with memory, and acceleration hardware, to execute a sub-program corresponding to instructions. A set of input queues includes a store address queue to receive, from the acceleration hardware, a first address of the memory, the first address associated with a store operation and a store data queue to receive, from the acceleration hardware, first data to be stored at the first address of the memory. The set of input queues also includes a completion queue to buffer response data for a load operation. A disambiguator circuit, coupled to the set of input queues and the memory, is to, responsive to determining the load operation, which succeeds the store operation, has an address conflict with the first address, copy the first data from the store data queue into the completion queue for the load operation.

26.

发明授权
Processors and methods with configurable network-based dataflow operator circuits 有权

公开(公告)号：US10469397B2

公开(公告)日：2019-11-05

申请号：US15640540

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr.

IPC: H04L12/721 , H04L12/801 , H04L12/863 , H04L12/935 , H04L12/937

Abstract: Systems, methods, and apparatuses relating to configurable network-based dataflow operator circuits are described. In one embodiment, a processor includes a spatial array of processing elements, and a packet switched communications network to route data within the spatial array between processing elements according to a dataflow graph to perform a first dataflow operation of the dataflow graph, wherein the packet switched communications network further comprises a plurality of network dataflow endpoint circuits to perform a second dataflow operation of the dataflow graph.

27.

发明授权
Sharing aware snoop filter apparatus and method 有权

公开(公告)号：US09898408B2

公开(公告)日：2018-02-20

申请号：US15088921

申请日：2016-04-01

Applicant: Intel Corporation

Inventor： Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.

IPC: G06F12/00 , G06F12/0831 , G06F12/0811 , G06F13/00 , G06F13/28

CPC classification number: G06F12/0831 , G06F12/0811 , G06F2212/283 , G06F2212/621

Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.

28.

发明授权
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems 有权

公开(公告)号：US12050912B2

公开(公告)日：2024-07-30

申请号：US18220225

申请日：2023-07-10

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3861 , G06F9/3865

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

29.

发明授权
Apparatuses, methods, and systems for fused operations using sign modification in a processing element of a configurable spatial accelerator 有权

公开(公告)号：US11907713B2

公开(公告)日：2024-02-20

申请号：US16729369

申请日：2019-12-28

Applicant: Intel Corporation

Inventor： Kermin E. Chofleming , Chuanjun Zhang , Daniel Towner , Simon C. Steely, Jr. , Benjamin Keen

IPC: G06F9/30 , G06F15/80

CPC classification number: G06F9/3001 , G06F9/30181 , G06F15/80

Abstract: Systems, methods, and apparatuses relating to a sign modification field for fused operations in a configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a processing element of the plurality of processing elements comprising: a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, operation circuitry coupled to the first plurality of input queues having the multiple bit width, a sign modification circuit coupled to the first plurality of input queues having the multiple bit width, and a configuration register within the processing element to store a configuration value comprising a sign modification field that causes the sign modification circuit to modify a sign bit of a value from the first plurality of input queues according to the sign modification field to create a sign modified value, and the configuration value causes the operation circuitry to perform a selected operation of a plurality of operations on a value from the first plurality of input queues and the sign modified value to create a resultant value, and store the resultant value in the at least one first output queue.

30.

发明授权
Signal pathways in multi-tile processors 有权

公开(公告)号：US11269805B2

公开(公告)日：2022-03-08

申请号：US15980579

申请日：2018-05-15

Applicant: Intel Corporation

Inventor： William J. Butera , Simon C. Steely, Jr. , Richard J. Dischler

IPC: G06F15/80 , G06F9/54 , G06F15/167 , G06F9/448 , G06F15/173 , G06F11/07 , H04L41/0604

Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification