-
公开(公告)号:US20210389984A1
公开(公告)日:2021-12-16
申请号:US17410818
申请日:2021-08-24
Applicant: Intel Corporation
Inventor: Robert PAWLOWSKI , Ankit MORE , Jason M. HOWARD , Joshua B. FRYMAN , Tina C. ZHONG , Shaden SMITH , Sowmya PITCHAIMOORTHY , Samkit JAIN , Vincent CAVE , Sriram AANANTHAKRISHNAN , Bharadwaj KRISHNAMURTHY
Abstract: Disclosed embodiments relate to an improved memory system architecture for multi-threaded processors. In one example, a system includes a system comprising a multi-threaded processor core (MTPC), the MTPC comprising: P pipelines, each to concurrently process T threads; a crossbar to communicatively couple the P pipelines; a memory for use by the P pipelines, a scheduler to optimize reduction operations by assigning multiple threads to generate results of commutative arithmetic operations, and then accumulate the generated results, and a memory controller (MC) to connect with external storage and other MTPCs, the MC further comprising at least one optimization selected from: an instruction set architecture including a dual-memory operation; a direct memory access (DMA) engine; a buffer to store multiple pending instruction cache requests; multiple channels across which to stripe memory requests; and a shadow-tag coherency management unit.
-
2.
公开(公告)号:US20190303159A1
公开(公告)日:2019-10-03
申请号:US15940768
申请日:2018-03-29
Applicant: Intel Corporation
Inventor: Joshua B. FRYMAN , Jason M. HOWARD , Priyanka SURESH , Banu Meenakshi NAGASUNDARAM , Srikanth DAKSHINAMOORTHY , Ankit MORE , Robert PAWLOWSKI , Samkit JAIN , Pranav YEOLEKAR , Avinash M. SEEGEHALLI , Surhud KHARE , Dinesh SOMASEKHAR , David S. DUNNING , Romain E. Cledat , William Paul GRIFFIN , Bhavitavya B. BHADVIYA , Ivan B. GANEV
Abstract: Disclosed embodiments relate to an instruction set architecture to facilitate energy-efficient computing for exascale architectures. In one embodiment, a processor includes a plurality of accelerator cores, each having a corresponding instruction set architecture (ISA); a fetch circuit to fetch one or more instructions specifying one of the accelerator cores, a decode circuit to decode the one or more fetched instructions, and an issue circuit to translate the one or more decoded instructions into the ISA corresponding to the specified accelerator core, collate the one or more translated instructions into an instruction packet, and issue the instruction packet to the specified accelerator core; and, wherein the plurality of accelerator cores comprise a memory engine (MENG), a collective engine (CENG), a queue engine (QENG), and a chain management unit (CMU).
-