Patent search ap:("INTEL CORPORATION") AND inv:"Shaden Smith" Page 1

1.

发明授权
Array broadcast and reduction systems and methods 有权

公开(公告)号：US10983793B2

公开(公告)日：2021-04-20

申请号：US16369846

申请日：2019-03-29

Applicant: INTEL CORPORATION

Inventor： Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith

IPC: G06F9/30 , G06F13/28 , G06F9/32 , G06F9/455

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

2.

发明授权
Memory system architecture for multi-threaded processors 有权

公开(公告)号：US11630691B2

公开(公告)日：2023-04-18

申请号：US17410818

申请日：2021-08-24

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Jason M. Howard , Joshua B. Fryman , Tina C. Zhong , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cave , Sriram Aananthakrishnan , Bharadwaj Krishnamurthy

IPC: G06F9/30 , G06F9/35 , G06F9/48 , G06F12/0815 , G06F9/38 , G06F13/28

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

3.

发明授权
Structures and operations of integrated circuits having network of configurable switches 有权

公开(公告)号：US10476492B2

公开(公告)日：2019-11-12

申请号：US16201915

申请日：2018-11-27

Applicant: Intel Corporation

Inventor： Ankit More , Jason M. Howard , Robert Pawlowski , Fabrizio Petrini , Shaden Smith

IPC: H03K17/00 , G11C7/10 , H03K19/173

Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

4.

发明申请
ARRAY BROADCAST AND REDUCTION SYSTEMS AND METHODS 审中-公开

公开(公告)号：US20200310795A1

公开(公告)日：2020-10-01

申请号：US16369846

申请日：2019-03-29

Applicant: INTEL CORPORATION

Inventor： Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith

IPC: G06F9/30 , G06F9/32 , G06F9/455

Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.

5.

发明申请
STRUCTURES AND OPERATIONS OF INTEGRATED CIRCUITS HAVING NETWORK OF CONFIGURABLE SWITCHES 审中-公开

公开(公告)号：US20190109590A1

公开(公告)日：2019-04-11

申请号：US16201915

申请日：2018-11-27

Applicant: Intel Corporation

Inventor： Ankit More , Jason M. Howard , Robert Pawlowski , Fabrizio Petrini , Shaden Smith

IPC: H03K17/00 , H03K19/173 , G11C7/10

CPC classification number: H03K17/005 , G11C7/1006 , H03K17/007 , H03K19/1733

Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.

6.

发明授权
Memory system architecture for multi-threaded processors 有权

公开(公告)号：US11106494B2

公开(公告)日：2021-08-31

申请号：US16147302

申请日：2018-09-28

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Jason M. Howard , Joshua B. Fryman , Tina C. Zhong , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cave , Sriram Aananthakrishnan , Bharadwaj Krishnamurthy

IPC: G06F9/30 , G06F9/38 , G06F9/48 , G06F12/0815 , G06F9/35 , G06F13/28

Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.

7.

发明授权
System, apparatus and method for barrier synchronization in a multi-threaded processor 有权

公开(公告)号：US11061742B2

公开(公告)日：2021-07-13

申请号：US16019685

申请日：2018-06-27

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cavé , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman

IPC: G06F9/52 , G06F9/30 , G06F9/38

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

8.

发明授权
Systems and methods for ISA support for indirect loads and stores for efficiently accessing compressed lists in graph applications 有权

公开(公告)号：US10929132B1

公开(公告)日：2021-02-23

申请号：US16579806

申请日：2019-09-23

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Scott Hagan Schmittel , Joshua Fryman , Wim Heirman , Jason Howard , Ankit More , Shaden Smith , Scott Cline

IPC: G06F9/30 , G06F9/35

Abstract: Disclosed embodiments relate to systems and methods for performing instructions to access a compressed graphic list. In one example, a processor includes fetch and decode circuitry to fetch and decode the single instruction to access the compressed graphic list, and execution circuitry to execute the decoded single instruction to cause access to the compressed graphic list by: receiving, from a load store queue, at a first op-engine associated with a first data location, an indirection request, computing, via the first op-engine, a second data location associated with a second op-engine, computing, via the second op-engine, a third data location associated with a third op-engine responsive to the indirection request, and providing, via the third op-engine, a data response to the load store queue responsive to receiving data from the third data location.

9.

发明申请
SYSTEM, APPARATUS AND METHOD FOR BARRIER SYNCHRONIZATION IN A MULTI-THREADED PROCESSOR 审中-公开

公开(公告)号：US20200004602A1

公开(公告)日：2020-01-02

申请号：US16019685

申请日：2018-06-27

Applicant: Intel Corporation

Inventor： Robert Pawlowski , Ankit More , Shaden Smith , Sowmya Pitchaimoorthy , Samkit Jain , Vincent Cavé , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman

IPC: G06F9/52 , G06F9/38 , G06F9/30

Abstract: In one embodiment, a first processor core includes: a plurality of execution pipelines each to execute instructions of one or more threads; a plurality of pipeline barrier circuits coupled to the plurality of execution pipelines, each of the plurality of pipeline barrier circuits associated with one of the plurality of execution pipelines to maintain status information for a plurality of barrier groups, each of the plurality of barrier groups formed of at least two threads; and a core barrier circuit to control operation of the plurality of pipeline barrier circuits and to inform the plurality of pipeline barrier circuits when a first barrier has been reached by a first barrier group of the plurality of barrier groups. Other embodiments are described and claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification