-
公开(公告)号:US20210406024A1
公开(公告)日:2021-12-30
申请号:US16913520
申请日:2020-06-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Ashok Tirupathy Venkatachar , Steven R. Havlir , Robert B. Cohen
IPC: G06F9/38 , G06F9/30 , G06F12/1027
Abstract: Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.
-
公开(公告)号:US20220027162A1
公开(公告)日:2022-01-27
申请号:US17497572
申请日:2021-10-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Joshua James Lindner , Neil N. Marketkar , Kai Troester , Emil Talpes , Ashok Tirupathy Venkatachar
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
-
公开(公告)号:US11620224B2
公开(公告)日:2023-04-04
申请号:US16709831
申请日:2019-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Aparna Thyagarajan , Ashok Tirupathy Venkatachar , Marius Evers , Angelo Wong , William E. Jones
IPC: G06F12/0862 , G06F12/0875
Abstract: Techniques for controlling prefetching of instructions into an instruction cache are provided. The techniques include tracking either or both of branch target buffer misses and instruction cache misses, modifying a throttle toggle based on the tracking, and adjusting prefetch activity based on the throttle toggle.
-
公开(公告)号:US11144324B2
公开(公告)日:2021-10-12
申请号:US16586642
申请日:2019-09-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Joshua James Lindner , Neil N. Marketkar , Kai Troester , Emil Talpes , Ashok Tirupathy Venkatachar
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
-
公开(公告)号:US12204911B2
公开(公告)日:2025-01-21
申请号:US17497572
申请日:2021-10-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Joshua James Lindner , Neil N. Marketkar , Kai Troester , Emil Talpes , Ashok Tirupathy Venkatachar
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
-
公开(公告)号:US20210173783A1
公开(公告)日:2021-06-10
申请号:US16709831
申请日:2019-12-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Aparna Thyagarajan , Ashok Tirupathy Venkatachar , Marius Evers , Angelo Wong , William E. Jones
IPC: G06F12/0862
Abstract: Techniques for controlling prefetching of instructions into an instruction cache are provided. The techniques include tracking either or both of branch target buffer misses and instruction cache misses, modifying a throttle toggle based on the tracking, and adjusting prefetch activity based on the throttle toggle.
-
7.
公开(公告)号:US20190391813A1
公开(公告)日:2019-12-26
申请号:US16014715
申请日:2018-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams
IPC: G06F9/38
Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
-
8.
公开(公告)号:US11579884B2
公开(公告)日:2023-02-14
申请号:US16913520
申请日:2020-06-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Ashok Tirupathy Venkatachar , Steven R. Havlir , Robert B. Cohen
IPC: G06F9/30 , G06F9/38 , G06F12/1027
Abstract: Techniques for performing instruction fetch operations are provided. The techniques include determining instruction addresses for a primary branch prediction path; requesting that a level 0 translation lookaside buffer (“TLB”) caches address translations for the primary branch prediction path; determining either or both of alternate control flow path instruction addresses and lookahead control flow path instruction addresses; and requesting that either the level 0 TLB or an alternative level TLB caches address translations for either or both of the alternate control flow path instruction addresses and the lookahead control flow path instruction addresses.
-
公开(公告)号:US20210096874A1
公开(公告)日:2021-04-01
申请号:US16586642
申请日:2019-09-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Matthew T. Sobel , Joshua James Lindner , Neil N. Marketkar , Kai Troester , Emil Talpes , Ashok Tirupathy Venkatachar
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for compressing multiple instruction operations together into a single retire queue entry are disclosed. A processor includes at least a scheduler, a retire queue, one or more execution units, and control logic. When the control logic detects a given instruction operation being dispatched by the scheduler to an execution unit, the control logic determines if the given instruction operation meets one or more conditions for being compressed with one or more other instruction operations into a single retire queue entry. If the one or more conditions are met, two or more instruction operations are stored together in a single retire queue entry. By compressing multiple instruction operations together into an individual retire queue entry, the retire queue is able to be used more efficiently, and the processor can speculatively execute more instructions without the retire queue exhausting its supply of available entries.
-
10.
公开(公告)号:US10896044B2
公开(公告)日:2021-01-19
申请号:US16014715
申请日:2018-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams
IPC: G06F9/38
Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
-
-
-
-
-
-
-
-
-