Patent search ap:("INTEL CORPORATION") AND inv:"Samantika S. Sury" Page 1

1.

发明公开
INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS 审中-公开

公开(公告)号：US20240362021A1

公开(公告)日：2024-10-31

申请号：US18670427

申请日：2024-05-21

Applicant: Intel Corporation

Inventor： Doddaballapur N. Jayasimha , Jonas Svennebring , Samantika S. Sury , Christopher J. Hughes , Jong Soo Park , Lingxiang Xiang

IPC: G06F9/30 , G06F9/38 , G06F9/46 , G06F13/28

CPC classification number: G06F9/3004 , G06F9/3001 , G06F9/30185 , G06F9/3836 , G06F9/46 , G06F13/28

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.

2.

发明申请
SOFTWARE-TRANSPARENT HARDWARE PREDICTOR FOR CORE-TO-CORE DATA TRANSFER OPTIMIZATION 审中-公开

公开(公告)号：US20200285578A1

公开(公告)日：2020-09-10

申请号：US16822939

申请日：2020-03-18

Applicant: Intel Corporation

Inventor： Ren Wang , Joseph Nuzman , Samantika S. Sury , Andrew J. Herdrich , Namakkal N. Venkatesan , Anil Vasudevan , Tsung-Yuan C. Tai , Niall D. McDonnell

IPC: G06F12/0831 , G06F12/084 , G06F12/0811

Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.

3.

发明授权
Sharing aware snoop filter apparatus and method 有权

公开(公告)号：US09898408B2

公开(公告)日：2018-02-20

申请号：US15088921

申请日：2016-04-01

Applicant: Intel Corporation

Inventor： Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, Jr.

IPC: G06F12/00 , G06F12/0831 , G06F12/0811 , G06F13/00 , G06F13/28

CPC classification number: G06F12/0831 , G06F12/0811 , G06F2212/283 , G06F2212/621

Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.

4.

发明申请
SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS 审中-公开

公开(公告)号：US20190205139A1

公开(公告)日：2019-07-04

申请号：US15858899

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , Joseph Nuzman , Jonas Svennebring , Doddaballapur N. Jayasimha , Samantika S. Sury , David A. Koufaty , Niall D. McDonnell , Yen-Cheng Liu , Stephen R. Van Doren , Stephen J. Robinson

IPC: G06F9/30

Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

5.

发明申请
METHOD AND APPARATUS FOR ADAPTIVELY SELECTING DATA TRANSFER PROCESSES FOR SINGLE-PRODUCER-SINGLE-CONSUMER AND WIDELY SHARED CACHE LINES 审中-公开

公开(公告)号：US20190102295A1

公开(公告)日：2019-04-04

申请号：US15721121

申请日：2017-09-29

Applicant: Intel Corporation

Inventor： Samantika S. Sury , Robert G. Blankenship , Simon C. Steely, JR. , Yen-Cheng Liu

IPC: G06F12/084 , G06F12/0846 , G06F12/128 , G06F12/0811

Abstract: A method for adaptively performing a set of data transfer processes in a multi-core processor is described. The method may include receiving, by a shared cache from a first core cache, a first request for a cache line; determining, by the shared cache in response to receipt of the first request, whether the cache line is a widely-shared cache line or a single-producer-single-consumer cache line; and performing, by the first core cache and a second core cache, a three-hop data transfer process in response to determining that the cache line is a single-producer-single-consumer cache line, wherein the three-hop data transfer process transfers the cache line directly from the second core cache to the first core cache.

6.

发明申请
PROCESSORS, METHODS, AND SYSTEMS FOR A CONFIGURABLE SPATIAL ACCELERATOR WITH TRANSACTIONAL AND REPLAY FEATURES 审中-公开

公开(公告)号：US20190004945A1

公开(公告)日：2019-01-03

申请号：US15640533

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Kermin Fleming , Kent D. Glossop , Simon C. Steely, JR. , Samantika S. Sury

IPC: G06F12/0802 , G06F17/50 , H03K19/177

CPC classification number: G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0815 , G06F15/7867 , G06F15/8015 , G06F15/825 , G06F17/505 , G11C7/1012 , G11C8/12 , G11C2207/2245 , H03K19/17736 , H03K19/17756 , H03K19/1776 , H03K19/17764 , H03K19/17776 , H03K19/1778

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.

7.

发明申请
HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY 有权
Title translation: 硬件设备和控制高速缓存行的方法

公开(公告)号：US20160092354A1

公开(公告)日：2016-03-31

申请号：US14498946

申请日：2014-09-26

Applicant: INTEL CORPORATION

Inventor： Simon C. Steely, JR. , Samantika S. Sury , William C. Hasenplaugh

IPC: G06F12/08

CPC classification number: G06F12/0824 , G06F12/0811 , G06F2212/1024 , G06F2212/1048 , G06F2212/2542

Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

Abstract translation: 描述了控制高速缓存行一致性的方法和装置。处理器可以包括具有高速缓存以存储高速缓存行的第一核心，从第一核心发送对高速缓存线路的请求的第二核心，移动逻辑以使高速缓存行在第一核心和存储器之间移动; 更新移动的标签目录，以及高速缓存行一致性逻辑，以从请求中在标签目录中创建链路归属，以使高速缓存行从标签目录发送到第二核心。控制高速缓存行相关性的方法可以包括：从第二处理器核心的第一处理器核心中的对高速缓存行的请求创建标签目录中的链路归属，以使高速缓存行从标签目录发送到第二处理器核心。

8.

发明授权
Adaptive remote atomics 有权

公开(公告)号：US12216579B2

公开(公告)日：2025-02-04

申请号：US17134254

申请日：2020-12-25

Applicant: Intel Corporation

Inventor： Carl J. Beckmann , Samantika S. Sury , Christopher J. Hughes , Lingxiang Xiang , Rahul Agrawal

IPC: G06F12/0811 , G06F12/0817 , G06F12/084 , G06F12/0862

Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

9.

发明申请
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS 审中-公开

公开(公告)号：US20190384601A1

公开(公告)日：2019-12-19

申请号：US16537318

申请日：2019-08-09

Applicant: Intel Corporation

Inventor： William C. Hasenplaugh , Chris J. Newburn , Simon C. Steely, JR. , Samantika S. Sury

IPC: G06F9/30 , G06F12/0886 , G06F12/0897 , G06F12/126 , G06F12/1045

Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.

10.

发明授权
Processors, methods, and systems for a configurable spatial accelerator with transactional and replay features 有权

公开(公告)号：US10445234B2

公开(公告)日：2019-10-15

申请号：US15640533

申请日：2017-07-01

Applicant: Intel Corporation

Inventor： Kermin Fleming , Kent D. Glossop , Simon C. Steely, Jr. , Samantika S. Sury

IPC: G06F12/0802 , H03K19/177 , G06F17/50 , G11C7/10 , G06F15/78 , G06F15/80 , G11C8/12

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In an embodiment, a processor includes a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements are to perform an atomic operation when an incoming operand set arrives at the plurality of processing elements.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification