-
公开(公告)号:US11256505B2
公开(公告)日:2022-02-22
申请号:US17169053
申请日:2021-02-05
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Arunachalam Annamalai , Marius Evers , Aparna Thyagarajan , Anthony Jarvis
IPC: G06F9/30 , G06F9/38 , G06F1/3296
Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
-
2.
公开(公告)号:US20190391813A1
公开(公告)日:2019-12-26
申请号:US16014715
申请日:2018-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams
IPC: G06F9/38
Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
-
公开(公告)号:US10915322B2
公开(公告)日:2021-02-09
申请号:US16134440
申请日:2018-09-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Arunachalam Annamalai , Marius Evers , Aparna Thyagarajan , Anthony Jarvis
IPC: G06F9/30 , G06F9/38 , G06F1/3296
Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.
-
公开(公告)号:US11055098B2
公开(公告)日:2021-07-06
申请号:US16043293
申请日:2018-07-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Aparna Thyagarajan , Marius Evers , Arunachalam Annamalai
Abstract: A processor includes a branch target buffer (BTB) having a plurality of entries whereby each entry corresponds to an associated instruction pointer value that is predicted to be a branch instruction. Each BTB entry stores a predicted branch target address for the branch instruction, and further stores information indicating whether the next branch in the block of instructions associated with the predicted branch target address is predicted to be a return instruction. In response to the BTB indicating that the next branch is predicted to be a return instruction, the processor initiates an access to a return stack that stores the return address for the predicted return instruction. By initiating access to the return stack responsive to the return prediction stored at the BTB, the processor reduces the delay in identifying the return address, thereby improving processing efficiency.
-
公开(公告)号:US10896044B2
公开(公告)日:2021-01-19
申请号:US16014715
申请日:2018-06-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams
IPC: G06F9/38
Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
-
-
-
-