-
公开(公告)号:US11656971B2
公开(公告)日:2023-05-23
申请号:US17582051
申请日:2022-01-24
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney
CPC classification number: G06F11/3476 , G06F9/24 , G06F9/3836 , G06F11/3024 , G06F11/3055 , G06F15/7875
Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.
-
公开(公告)号:US20220206925A1
公开(公告)日:2022-06-30
申请号:US17582051
申请日:2022-01-24
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney
Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.
-
13.
公开(公告)号:US20220197813A1
公开(公告)日:2022-06-23
申请号:US17133624
申请日:2020-12-23
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Adarsh Chauhan , Vinodh Gopal , Vedvyas Shanbhogue , Sreenivas Subramoney , Wajdi Feghali
IPC: G06F12/0875 , G06F12/0813 , G06F12/0811 , G06F12/1045
Abstract: Methods and apparatus relating to techniques for increasing per core memory bandwidth by using forget store operations are described. In an embodiment, a cache stores a buffer. Execution circuitry executes an instruction. The instruction causes one or more cachelines in the cache to be marked based on a start address for the buffer and a size of the buffer. A marked cacheline in the cache is to be prevented from being written back to memory. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10915421B1
公开(公告)日:2021-02-09
申请号:US16575535
申请日:2019-09-19
Applicant: Intel Corporation
Inventor: Adarsh Chauhan , Jayesh Gaur , Franck Sala , Lihu Rappoport , Zeev Sperber , Adi Yoaz , Sreenivas Subramoney
Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state. In response to the usefulness state denoting the confirmed bad state, the DTU circuitry automatically disables the microarchitectural feature for the selected address for execution windows after the second execution window. Other embodiments are described and claimed.
-
15.
公开(公告)号:US10268600B2
公开(公告)日:2019-04-23
申请号:US15701795
申请日:2017-09-12
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Sreenivas Subramoney , Sanjay Ganapathy
IPC: G06F12/0862 , G06F12/126 , G06F12/128 , G06F12/0811
Abstract: In one embodiment, a processor includes: a first cache controller to control a first cache memory. This cache controller may include a replacement circuit to: associate a first priority indicator with a first cache line based on storage of demand data in the first cache line and first learning information associated with a set of demand-based categories of cache lines; and associate a second priority indicator with a second cache line based on storage of prefetch data in the second cache line and second learning information associated with a set of prefetch-based categories of cache lines. Other embodiments are described and claimed.
-
公开(公告)号:US20180232235A1
公开(公告)日:2018-08-16
申请号:US15433674
申请日:2017-02-15
Applicant: INTEL CORPORATION
Inventor: Jayesh Gaur , Pooja Roy , Sreenivas Subramoney , Hong Wang , Ronak Singhal
CPC classification number: G06F9/3838 , G06F8/41
Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.
-
公开(公告)号:US20180181329A1
公开(公告)日:2018-06-28
申请号:US15392638
申请日:2016-12-28
Applicant: Intel Corporation
Inventor: Ishwar S. Bhati , Udit Dhawan , Jayesh Gaur , Sreenivas Subramoney
IPC: G06F3/06 , G06F12/0897
CPC classification number: G06F12/0897 , G06F9/3824 , G06F12/0811 , G06F2212/1041 , G06F2212/1056 , G06F2212/302
Abstract: Processor, apparatus, and method for reordering a stream of memory access requests to establish locality are described herein. One embodiment of a method includes: storing in a request queue memory access requests generated by a plurality of execution units, the memory access requests comprising a first request to access a first memory page in a memory and a second request to access a second memory page in the memory; maintaining a list of unique memory pages, each unique memory page associated with one or more memory access requests stored the request queue and is to be accessed by the one or more memory access requests; selecting a current memory page from the list of unique memory pages; and dispatching from the request queue to the memory, all memory access requests associated with the current memory page before any other memory access request in the request queue is dispatched.
-
公开(公告)号:US20150089126A1
公开(公告)日:2015-03-26
申请号:US14036673
申请日:2013-09-25
Applicant: Intel Corporation
Inventor: Sreenivas Subramoney , Jayesh Gaur , Alaa R. Alameldeen
IPC: G06F12/08
CPC classification number: G06F12/126 , G06F12/0895
Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.
Abstract translation: 在一个实施例中,处理器包括高速缓存数据阵列,其包括多个物理方式,每种物理方式来存储基线方式和受害方式; 包括多个标签组的缓存标签阵列,与特定物理方式相关联的每个标签组,并且包括与以特定物理方式存储的基线方式相关联的第一标签,以及与存储在特定物理方式中的受害方式相关联的第二标签 物理方式 以及高速缓存控制逻辑,以:基于替换策略选择第一基线方式,基于包括所述第一受害者方式的第一物理方式的可用容量选择第一受害者方式,并将第一数据元素从所述第一基线方式移动到 第一个受害者的方式。 描述和要求保护其他实施例。
-
公开(公告)号:US20240103878A1
公开(公告)日:2024-03-28
申请号:US17953184
申请日:2022-09-26
Applicant: Intel Corporation
Inventor: Jayesh Gaur , Sufiyan Syed , Adithya Ranganathan , Sreenivas Subramoney
IPC: G06F9/38
CPC classification number: G06F9/3861 , G06F9/3867 , G06F9/3806
Abstract: An example of an integrated circuit may include a first execution cluster, a second execution cluster that is one or more of narrower and shallower as compared to the first execution cluster, and circuitry to selectively steer instructions to the first execution cluster and the second execution cluster based on branch misprediction information. Other embodiments are disclosed and claimed.
-
公开(公告)号:US20240103874A1
公开(公告)日:2024-03-28
申请号:US17951859
申请日:2022-09-23
Applicant: Intel Corporation
Inventor: Niranjan Kumar Soundararajan , Sreenivas Subramoney , Jayesh Gaur
CPC classification number: G06F9/381 , G06F9/30065 , G06F9/325
Abstract: Methods and apparatus for instruction elimination through hardware driven memoization of loop instances. A hardware-based loop memoization technique learns repeating sequences of loops and transparently removes instructions for the loop instructions from instruction sequences while making their output available to dependent instructions as if the loop instructions had been executed. A path-based predictor is implemented at the front-end to predict these loop instances and remove their instructions from instruction sequences. A novel memoization prediction micro-operation (Uop) is inserted into the instruction sequence for instances of loops that are predicted to be memoized. The memoization prediction Uop is used to compare the input signature (expected set of input values for the loop) with the actual signature to determine correct and incorrect predictions. The input signature learnt is based on all live-ins of a loop, both explicit register-based live-ins as well as loads to memory in the loop body that determine code path and outputs.
-
-
-
-
-
-
-
-
-