-
公开(公告)号:US20220229677A1
公开(公告)日:2022-07-21
申请号:US17712094
申请日:2022-04-02
申请人: Intel Corporation
发明人: Wim Heirman , Stijn Eyerman , Kristof Du Bois , Ibrahim Hur
摘要: A distributed simulation system is provided that includes a timing simulator and functional simulator(s) on different computing nodes to simulate a graph processing system. The functional simulators are to simulate execution of a set of instructions on the graph processing system and to send information associated with the simulated set of instructions to the timing simulator over the network. The timing simulator is to determine timing information associated with execution of the sets of instructions sent by the functional simulators and send the timing information to the functional simulators over the network. The timing simulator may determine a global synchronization point for the functional simulators and send the timing information for the sets of instructions to respective functional simulators at the global synchronization point. The functional simulators may stall simulation of further instructions until the timing information for its set of instructions is received from the timing simulator.
-
公开(公告)号:US10394678B2
公开(公告)日:2019-08-27
申请号:US15394432
申请日:2016-12-29
申请人: Intel Corporation
发明人: Wim Heirman , Yves Vandriessche
IPC分类号: G06F9/30 , G06F9/38 , G06F9/52 , G06F11/30 , G06F1/3228 , G06F12/084 , G06F12/0804 , G06F12/0808 , G06F12/0831 , G06F12/0842
摘要: A processor core includes a decode circuit to decode an instruction. The processor core further includes a monitor circuit, where the monitor circuit includes a data structure to store a plurality of entries for addresses that are being monitored by the monitor circuit and a triggered queue to store a plurality of addresses for which a triggering event occurred. The processor core further includes an execution circuit to execute the decoded instruction to dequeue an address from the triggered queue and return the dequeued address in response to a determination that the triggered queue is not empty.
-
公开(公告)号:US11256626B2
公开(公告)日:2022-02-22
申请号:US16837833
申请日:2020-04-01
申请人: Intel Corporation
发明人: Wim Heirman , Ibrahim Hur , Ugonna Echeruo , Stijn Eyerman , Kristof Du Bois
IPC分类号: G06F12/08 , G06F12/0862
摘要: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
-
4.
公开(公告)号:US20200174929A1
公开(公告)日:2020-06-04
申请号:US16203891
申请日:2018-11-29
申请人: Intel Corporation
发明人: Wim Heirman , Stijn Eyerman , Kristof Du Bois , Ibrahim Hur , Joshua B. Fryman
IPC分类号: G06F12/0804
摘要: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.
-
公开(公告)号:US10684858B2
公开(公告)日:2020-06-16
申请号:US15996184
申请日:2018-06-01
申请人: Intel Corporation
发明人: Stijn Eyerman , Wim Heirman , Kristof Du Bois , Ibrahim Hur , Joshua B. Fryman
摘要: Disclosed embodiments relate to an indirect memory fetch (IMF) unit. In one example, an apparatus includes circuitry to fetch and decode an instruction specifying a sparse operand array including N operands, and an index array including N contiguously-addressed indices. The apparatus further includes a processing engine associated with an IMF unit to respond to the decoded instruction by initializing the IMF unit to fetch the N operands in order, probing the IMF unit to determine that a fetched operand is ready to retrieve, retrieving the fetched operand from the IMF unit, and repeating the probing and retrieving until all N operands have been retrieved. The IMF unit, independent of the processing engine, is to fetch the N contiguously-addressed indices from the index array, use the N fetched indices to calculate memory addresses for the N operands, and issue a plurality of read requests to fetch the N operands in order.
-
公开(公告)号:US20190095333A1
公开(公告)日:2019-03-28
申请号:US15718845
申请日:2017-09-28
申请人: Intel Corporation
发明人: Wim Heirman , Kristof Du Bois , Yves Vandriessche , Stijn Eyerman , Ibrahim Hur
IPC分类号: G06F12/0862 , G06F5/14
摘要: Embodiments of apparatuses, methods, and systems for independent tuning of multiple hardware prefetchers are described. In an embodiment, an apparatus includes a processor core, a cache memory, a hardware prefetcher, and a prefetch tuner. The hardware prefetcher is to prefetch data for the processor core from a system memory to the cache memory. The prefetch tuner is to adjust a prefetch rate of the hardware prefetcher based on a fraction of late prefetches. The prefetch tuner includes a late prefetch counter to count a number of late prefetches for the hardware prefetcher, a prefetch counter to count a number of prefetches for the hardware prefetcher, and a late prefetch calculator to calculate the fraction of late prefetches based on the number of late prefetches and the number of prefetches.
-
公开(公告)号:US20180239705A1
公开(公告)日:2018-08-23
申请号:US15439551
申请日:2017-02-22
申请人: INTEL CORPORATION
发明人: Wim Heirman , Yves Vandriessche , Ibrahim Hur
IPC分类号: G06F12/0862 , G06F9/30 , G06F12/0875 , G06F9/38
CPC分类号: G06F12/0862 , G06F9/30047 , G06F9/30181 , G06F9/383 , G06F12/0848 , G06F12/0875 , G06F2212/1021 , G06F2212/1024 , G06F2212/452 , G06F2212/502 , G06F2212/6028
摘要: An example processor that includes a register, a cache, a processor core, and a programmable logic circuit. The register may store a first prefetch value indicating a first amount of time to prefetch data from a memory prior to an execution of a subsequent instruction that uses the data. The processor core may be coupled to the cache and the register. The processor core may execute a prefetch instruction to access the data from the memory, store a copy of the data in the cache, and execute the subsequent instruction. The programmable logic circuit may be coupled to the processor core. The programmable logic circuit may determine whether the first amount of time is insufficient to prefetch the data for the execution of the subsequent instruction and change the first prefetch value to a second prefetch value when the first amount of time is insufficient.
-
公开(公告)号:US20230418612A1
公开(公告)日:2023-12-28
申请号:US17848284
申请日:2022-06-23
申请人: Intel Corporation
发明人: Kristof Du Bois , Wim Heirman , Stijn Eyerman , Ibrahim Hur , Jason Agron
IPC分类号: G06F9/30
CPC分类号: G06F9/30181 , G06F9/30036
摘要: Techniques for automatic fusion of arithmetic in-flight instructions are described. An example apparatus comprises a buffer to store instructions to be issued to a functional unit for execution, and circuitry coupled to the buffer to combine two or more instructions from the buffer into a single combined instruction. Other examples are disclosed and claimed.
-
公开(公告)号:US20220197821A1
公开(公告)日:2022-06-23
申请号:US17133414
申请日:2020-12-23
申请人: Intel Corporation
发明人: Wim Heirman , Ibrahim Hur
IPC分类号: G06F12/1027 , G06F12/0862 , G06F12/0891 , G06F9/30
摘要: Techniques and mechanisms for providing information to determine whether a software prefetch instruction is to be executed. In an embodiment, one or more entries of a translation lookaside buffer (TLB) each include a respective value which indicates whether, according to one or more criteria, corresponding data has been sufficiently utilized. Insufficiently utilized data is indicated in a TLB entry with an identifier of an executed instruction to prefetch the corresponding data. An eviction of the TLB entry results in the creation of an entry in a registry of prefetch instructions. The entry in the registry includes the identifier of the executed prefetch instruction, and a value indicating a number of times that one or more future prefetch instructions are to be dropped. In another embodiment, execution of a subsequent prefetch instruction—which also corresponds to the identifier—is prevented based on the registry entry.
-
10.
公开(公告)号:US10942851B2
公开(公告)日:2021-03-09
申请号:US16203891
申请日:2018-11-29
申请人: Intel Corporation
发明人: Wim Heirman , Stijn Eyerman , Kristof Du Bois , Ibrahim Hur , Joshua B. Fryman
IPC分类号: G06F12/08 , G06F12/0804
摘要: In one embodiment, an apparatus includes a memory access circuit to receive memory access instructions and provide at least some of the memory access instructions to a memory subsystem for execution. The memory access circuit may have a conversion circuit to convert the first memory access instruction to a first subline memory access instruction, e.g., based at least in part on an access history for a first memory access instruction. Other embodiments are described and claimed.
-
-
-
-
-
-
-
-
-