-
公开(公告)号:US20210409265A1
公开(公告)日:2021-12-30
申请号:US17473540
申请日:2021-09-13
Applicant: Intel Corporation
Inventor: Robert PAWLOWSKI , Vincent CAVE , Shruti SHARMA , Fabrizio PETRINI , Joshua B. FRYMAN , Ankit MORE
IPC: H04L12/24
Abstract: Examples described herein relate to a first group of core nodes to couple with a group of switch nodes and a second group of core nodes to couple with the group of switch nodes, wherein: a core node of the first or second group of core nodes includes circuitry to execute one or more message passing instructions that indicate a configuration of a network to transmit data toward two or more endpoint core nodes and a switch node of the group of switch nodes includes circuitry to execute one or more message passing instructions that indicate the configuration to transmit data toward the two or more endpoint core nodes.
-
2.
公开(公告)号:US20190303159A1
公开(公告)日:2019-10-03
申请号:US15940768
申请日:2018-03-29
Applicant: Intel Corporation
Inventor: Joshua B. FRYMAN , Jason M. HOWARD , Priyanka SURESH , Banu Meenakshi NAGASUNDARAM , Srikanth DAKSHINAMOORTHY , Ankit MORE , Robert PAWLOWSKI , Samkit JAIN , Pranav YEOLEKAR , Avinash M. SEEGEHALLI , Surhud KHARE , Dinesh SOMASEKHAR , David S. DUNNING , Romain E. Cledat , William Paul GRIFFIN , Bhavitavya B. BHADVIYA , Ivan B. GANEV
Abstract: Disclosed embodiments relate to an instruction set architecture to facilitate energy-efficient computing for exascale architectures. In one embodiment, a processor includes a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA); a fetch circuit to fetch one or more instructions specifying one of the accelerator cores, a decode circuit to decode the one or more fetched instructions, and an issue circuit to translate the one or more decoded instructions into the ISA corresponding to the specified accelerator core, collate the one or more translated instructions into an instruction packet, and issue the instruction packet to the specified accelerator core; and, wherein the plurality of accelerator cores comprise a memory engine (MENG), a collective engine (CENG), a queue engine (QENG), and a chain management unit (CMU).
-
公开(公告)号:US20210389984A1
公开(公告)日:2021-12-16
申请号:US17410818
申请日:2021-08-24
Applicant: Intel Corporation
Inventor: Robert PAWLOWSKI , Ankit MORE , Jason M. HOWARD , Joshua B. FRYMAN , Tina C. ZHONG , Shaden SMITH , Sowmya PITCHAIMOORTHY , Samkit JAIN , Vincent CAVE , Sriram AANANTHAKRISHNAN , Bharadwaj KRISHNAMURTHY
Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.
-
公开(公告)号:US20210149683A1
公开(公告)日:2021-05-20
申请号:US17129555
申请日:2020-12-21
Applicant: Intel Corporation
Inventor: Ankit MORE , Fabrizio PETRINI , Robert PAWLOWSKI , Shruti SHARMA , Sowmya PITCHAIMOORTHY
IPC: G06F9/4401 , G06F13/40
Abstract: Examples include techniques for an in-network acceleration of a parallel prefix-scan operation. Examples include configuring registers of a node included in a plurality of nodes on a same semiconductor package. The registers to be configured responsive to receiving an instruction that indicates a logical tree to map to a network topology that includes the node. The instruction associated with a prefix-scan operation to be executed by at least a portion of the plurality of nodes.
-
-
-