Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Arunachalam Annamalai"

1.

发明授权
Using loop exit prediction to accelerate or suppress loop mode of a processor 有权

公开(公告)号：US11256505B2

公开(公告)日：2022-02-22

申请号：US17169053

申请日：2021-02-05

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Arunachalam Annamalai , Marius Evers , Aparna Thyagarajan , Anthony Jarvis

IPC: G06F9/30 , G06F9/38 , G06F1/3296

Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.

2.

发明申请
LOW LATENCY SYNCHRONIZATION FOR OPERATION CACHE AND INSTRUCTION CACHE FETCHING AND DECODING INSTRUCTIONS 审中-公开

公开(公告)号：US20190391813A1

公开(公告)日：2019-12-26

申请号：US16014715

申请日：2018-06-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams

IPC: G06F9/38

Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.

3.

发明授权
Using loop exit prediction to accelerate or suppress loop mode of a processor 有权

公开(公告)号：US10915322B2

公开(公告)日：2021-02-09

申请号：US16134440

申请日：2018-09-18

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Arunachalam Annamalai , Marius Evers , Aparna Thyagarajan , Anthony Jarvis

IPC: G06F9/30 , G06F9/38 , G06F1/3296

Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.

4.

发明授权
Branch target buffer with early return prediction 有权

公开(公告)号：US11055098B2

公开(公告)日：2021-07-06

申请号：US16043293

申请日：2018-07-24

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Aparna Thyagarajan , Marius Evers , Arunachalam Annamalai

IPC: G06F9/38 , G06F9/30

Abstract: A processor includes a branch target buffer (BTB) having a plurality of entries whereby each entry corresponds to an associated instruction pointer value that is predicted to be a branch instruction. Each BTB entry stores a predicted branch target address for the branch instruction, and further stores information indicating whether the next branch in the block of instructions associated with the predicted branch target address is predicted to be a return instruction. In response to the BTB indicating that the next branch is predicted to be a return instruction, the processor initiates an access to a return stack that stores the return address for the predicted return instruction. By initiating access to the return stack responsive to the return prediction stored at the BTB, the processor reduces the delay in identifying the return address, thereby improving processing efficiency.

5.

发明授权
Low latency synchronization for operation cache and instruction cache fetching and decoding instructions 有权

公开(公告)号：US10896044B2

公开(公告)日：2021-01-19

申请号：US16014715

申请日：2018-06-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Marius Evers , Dhanaraj Bapurao Tavare , Ashok Tirupathy Venkatachar , Arunachalam Annamalai , Donald A. Priore , Douglas R. Williams

IPC: G06F9/38

Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.

Patent Agency Ranking