-
11.
公开(公告)号:US20200004684A1
公开(公告)日:2020-01-02
申请号:US16024527
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Wim Heirman , Ibrahim Hur , Ugonna Echeruo , Stijn Eyerman , Kristof Du Bois
IPC: G06F12/0862
Abstract: Apparatus, method, and system for enhancing data prefetching based on non-uniform memory access (NUMA) characteristics are described herein. An apparatus embodiment includes a system memory, a cache, and a prefetcher. The system memory includes multiple memory regions, at least some of which are associated with different NUMA characteristic (access latency, bandwidth, etc.) than others. Each region is associated with its own set of prefetch parameters that are set in accordance to their respective NUMA characteristics. The prefetcher monitors data accesses to the cache and generates one or more prefetch requests to fetch data from the system memory to the cache based on the monitored data accesses and the set of prefetch parameters associated with the memory region from which data is to be fetched. The set of prefetcher parameters may include prefetch distance, training-to-stable threshold, and throttle threshold.
-
公开(公告)号:US12050915B2
公开(公告)日:2024-07-30
申请号:US17130592
申请日:2020-12-22
Applicant: Intel Corporation
Inventor: Wim Heirman , Stijn Eyerman , Ibrahim Hur
IPC: G06F9/30 , G06F9/38 , G06F11/34 , G06F12/0811 , G06F12/0862
CPC classification number: G06F9/3802 , G06F9/30047 , G06F9/3818 , G06F11/3409 , G06F12/0811 , G06F12/0862 , G06F9/383 , G06F2212/452
Abstract: In an embodiment, a processor includes a fetch circuit to fetch instructions, the instructions including a code prefetch instruction; a decode circuit to decode the code prefetch instruction and provide the decoded code prefetch instruction to a memory circuit, the memory circuit to execute the decoded code prefetch instruction to prefetch a first set of code blocks into a first cache and to prefetch a second set of code blocks into a second cache. Other embodiments are described and claimed.
-
公开(公告)号:US20220283719A1
公开(公告)日:2022-09-08
申请号:US17824413
申请日:2022-05-25
Applicant: Intel Corporation
Inventor: Stijn Eyerman , Wim Heirman , Ibrahim Hur
IPC: G06F3/06
Abstract: An apparatus to facilitate generating a memory bandwidth stack for visualizing memory bandwidth utilization is disclosed. The apparatus includes processors to receive data corresponding to a memory cycle occurring during a total execution time of an application executed by the one or more processors; for the memory cycle, assign the memory cycle to a component of a bandwidth stack based on analysis of the data and in accordance with a prioritization scheme; for the component, determine a portion of the bandwidth stack to account to the component based at least in part on the assignment of the memory cycle to the component; and generate the bandwidth stack by at least representing the portion accounted to the component in the bandwidth stack.
-
公开(公告)号:US20220100511A1
公开(公告)日:2022-03-31
申请号:US17033770
申请日:2020-09-26
Applicant: Intel Corporation
Inventor: Wim Heirman , Stijn Eyerman , Ibrahim Hur
IPC: G06F9/30 , G06F12/0804 , G06F12/12
Abstract: Methods and apparatus relating to one or more delayed cache writeback instructions for improved data sharing in manycore processors are described. In an embodiment, a delayed cache writeback instruction causes a cache block in a modified state in a Level 1 (L1) cache of a first core of a plurality of cores of a multi-core processor to a Modified write back (M.wb) state. The M.wb state causes the cache block to be written back to LLC upon eviction of the cache block from the L1 cache. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20190303294A1
公开(公告)日:2019-10-03
申请号:US15940712
申请日:2018-03-29
Applicant: Intel Corporation
Inventor: Wim Heirman , Kristof Du Bois , Yves Vandriessche , Stijn Eyerman , Ibrahim Hur , Erik Hallnor
IPC: G06F12/0811 , G06F12/0815 , G06F12/0871 , G06F12/128
Abstract: Embodiment of this disclosure provides a mechanism to store cache lines in dedicated cache of an idle core. In one embodiment, a multi-core processor comprising a first core, a second core, a first cache, a second cache, a third cache, and a cache controller unit is provided. The cache controller is operatively coupled to at least the first cache, the second cache, and the third cache. The cache controller is to evict a first line from the first cache, wherein the first core is in an active state. Responsive to the evicting of the first line, the first line is stored in the third cache. Responsive to storing the first line, a second line is evicted from the third cache. Responsive to evicting the second line, the second line is stored in the second cache when the second core is in an idle state.
-
公开(公告)号:US10303609B2
公开(公告)日:2019-05-28
申请号:US15718845
申请日:2017-09-28
Applicant: Intel Corporation
Inventor: Wim Heirman , Kristof Du Bois , Yves Vandriessche , Stijn Eyerman , Ibrahim Hur
IPC: G06F12/0862
Abstract: Embodiments of apparatuses, methods, and systems for independent tuning of multiple hardware prefetchers are described. In an embodiment, an apparatus includes a processor core, a cache memory, a hardware prefetcher, and a prefetch tuner. The hardware prefetcher is to prefetch data for the processor core from a system memory to the cache memory. The prefetch tuner is to adjust a prefetch rate of the hardware prefetcher based on a fraction of late prefetches. The prefetch tuner includes a late prefetch counter to count a number of late prefetches for the hardware prefetcher, a prefetch counter to count a number of prefetches for the hardware prefetcher, and a late prefetch calculator to calculate the fraction of late prefetches based on the number of late prefetches and the number of prefetches.
-
公开(公告)号:US20190004920A1
公开(公告)日:2019-01-03
申请号:US15638727
申请日:2017-06-30
Applicant: Intel Corporation
Inventor: Yves Vandriessche , Wim Heirman , Ibrahim Hur , Kristof du Bois , Stijn Eyerman
Abstract: Technologies for processor architecture simulation with machine learning include a computing device that simulates performance of a processor executing training programs with a simulation model. The computing device captures ground truth performance statistics of the processor executing the training programs, for example using a cycle-accurate simulator. The computing device collects training simulation statistics from the simulation model and trains an error model with the training simulation statistics as feature vector and with the ground truth performance statistics. The computing device may simulate performance of the processor executing a test program, capture test simulation statistic from the simulation model, and predict a predicted error of the simulation model using the error model with the test simulation statistics as feature vector. The computing device may adjust output of the simulation model or adapt execution of the simulation model based on the predicted error. Other embodiments are described and claimed.
-
-
-
-
-
-