-
21.
公开(公告)号:US20230315451A1
公开(公告)日:2023-10-05
申请号:US18326623
申请日:2023-05-31
Applicant: Intel Corporation
Inventor: Shruti Sharma , Robert Pawlowski , Fabio Checconi , Jesmin Jahan Tithi
CPC classification number: G06F9/30043 , G06F9/30079 , G06F13/28
Abstract: Systems, apparatuses and methods may provide for technology that detects, by an operation engine, a plurality of sub-instruction requests from a first memory engine in a plurality of memory engines, wherein the plurality of sub-instruction requests are associated with a direct memory access (DMA) bitmap manipulation request from a first pipeline, wherein each sub-instruction request corresponds to a data element in the DMA bitmap manipulation request, and wherein the first memory engine is to correspond to the first pipeline. The technology also detects, by the operation engine, one or more arguments in the plurality of sub-instruction requests, sends, by the operation engine, one or more load requests to a DRAM in the plurality of DRAMs in accordance with the one or more arguments, and sends, by the operation engine, one or more store requests to the DRAM in accordance with the one or more arguments, wherein the operation engine is to correspond to the DRAM.
-
公开(公告)号:US20220413855A1
公开(公告)日:2022-12-29
申请号:US17359305
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Sriram Aananthakrishnan , Jason Howard , Joshua Fryman
IPC: G06F9/30 , G06F9/38 , G06F12/0875
Abstract: Techniques for operating on an indirect memory access instruction, where the instruction accesses a memory location via at least one indirect address. A pipeline processes the instruction and a memory operation engine generates a first access to the at least one indirect address and a second access to a target address determined by the at least one indirect address. A cache memory used with the pipeline and the memory operation engine caches pointers. In response to a cache hit when executing the indirect memory access instruction, operations dereference a pointer to obtain the at least one indirect address, not set a cache bit, and return data for the instruction without storing the data in the cache memory; and in response to a cache miss, operations set the cache bit, obtain, and store a cache line for a missed pointer, and return data without storing the data in the cache memory.
-
公开(公告)号:US20220100508A1
公开(公告)日:2022-03-31
申请号:US17134251
申请日:2020-12-25
Applicant: Intel Corporation
Inventor: Robert Pawlowski , Ankit More , Vincent Cave , Sriram Aananthakrishnan , Jason M. Howard , Joshua B. Fryman
Abstract: Embodiments of apparatuses and methods for copying and operating on matrix elements are described. In embodiments, an apparatus includes a hardware instruction decoder to decode a single instruction and execution circuitry, coupled to hardware instruction decoder, to perform one or more operations corresponding to the single instruction. The single instruction has a first operand to reference a base address of a first representation of a source matrix and a second operand to reference a base address of second representation of a destination matrix. The one or more operations include copying elements of the source matrix to corresponding locations in the destination matrix and filling empty elements of the destination matrix with a single value.
-
公开(公告)号:US20200310795A1
公开(公告)日:2020-10-01
申请号:US16369846
申请日:2019-03-29
Applicant: INTEL CORPORATION
Inventor: Joshua Fryman , Ankit More , Jason Howard , Robert Pawlowski , Yigit Demir , Nick Pepperling , Fabrizio Petrini , Sriram Aananthakrishnan , Shaden Smith
Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.
-
25.
公开(公告)号:US20190109590A1
公开(公告)日:2019-04-11
申请号:US16201915
申请日:2018-11-27
Applicant: Intel Corporation
Inventor: Ankit More , Jason M. Howard , Robert Pawlowski , Fabrizio Petrini , Shaden Smith
IPC: H03K17/00 , H03K19/173 , G11C7/10
CPC classification number: H03K17/005 , G11C7/1006 , H03K17/007 , H03K19/1733
Abstract: Embodiments herein may present an integrated circuit including a switch, where the switch together with other switches forms a network of switches to perform a sequence of operations according to a structure of a collective tree. The switch includes a first number of input ports, a second number of output ports, a configurable crossbar to selectively couple the first number of input ports to the second number of output ports, and a computation engine coupled to the first number of input ports, the second number of output ports, and the crossbar. The computation engine of the switch performs an operation corresponding to an operation represented by a node of the collective tree. The switch further includes one or more registers to selectively configure the first number of input ports and the configurable crossbar. Other embodiments may be described and/or claimed.
-
26.
公开(公告)号:US20180285252A1
公开(公告)日:2018-10-04
申请号:US15477072
申请日:2017-04-01
Applicant: Intel Corporation
Inventor: Kon-Woo Kwon , Vivek Kozhikkottu , Sang Phill Park , Ankit More , William P. Griffin , Robert Pawlowski , Jason M. Howard , Joshua B. Fryman
IPC: G06F12/02 , G06F12/0802 , G06F12/0846 , G11C7/10 , G06F12/06
Abstract: Optimized memory access bandwidth devices, systems, and methods for processing low spatial locality data are disclosed and described. A system memory is divided into a plurality of memory subsections, where each memory subsection is communicatively coupled to an independent memory channel to a memory controller. Memory access requests from a processor are thereby sent by the memory controller to only the appropriate memory subsection.
-
-
-
-
-