-
公开(公告)号:US20240211402A1
公开(公告)日:2024-06-27
申请号:US18146904
申请日:2022-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Travis Henry Boraten , Varun Agrawal , Michael Warren Boyer
IPC: G06F12/0817
CPC classification number: G06F12/0817 , G06F2212/1016
Abstract: In accordance with the described techniques for condensed coherence directory entries for processing in memory, a computing device includes a core that includes a cache, a memory that includes multiple banks, a coherence directory that includes a condensed entry indicating that data associated with a memory address and the multiple banks is not stored in the cache, and a cache coherence controller. The cache coherence controller receives a processing-in-memory command to the memory address and performs a single lookup in the coherence directory for the processing-in-memory command based on inclusion of the condensed entry in the coherence directory.
-
公开(公告)号:US20220206685A1
公开(公告)日:2022-06-30
申请号:US17139496
申请日:2020-12-31
Applicant: Advanced Micro Devices, Inc.
Inventor: John Kalamatianos , Varun Agrawal , Niti Madan
IPC: G06F3/06
Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N−1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.
-
公开(公告)号:US10853075B2
公开(公告)日:2020-12-01
申请号:US16725203
申请日:2019-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos , Adithya Yalavarti , Jingjie Qian
Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.
-
4.
公开(公告)号:US12008378B2
公开(公告)日:2024-06-11
申请号:US18132879
申请日:2023-04-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , Yasuko Eckert
IPC: G06F9/38 , G06F9/30 , G06F12/0815 , G06F13/16
CPC classification number: G06F9/3895 , G06F9/30036 , G06F9/30105 , G06F12/0815 , G06F13/1668
Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
-
公开(公告)号:US20210382718A1
公开(公告)日:2021-12-09
申请号:US16895825
申请日:2020-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos
Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.
-
公开(公告)号:US20240004584A1
公开(公告)日:2024-01-04
申请号:US17855109
申请日:2022-06-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Niti Madan , Yasuko Eckert , Varun Agrawal , John Kalamatianos
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0653 , G06F3/0679 , G06F3/0604
Abstract: In accordance with described techniques for DRAM row management for processing in memory, a plurality of instructions are obtained for execution by a processing in memory component embedded in a dynamic random access memory. An instruction is identified that last accesses a row of the dynamic random access memory, and a subsequent instruction is identified that first accesses an additional row of the dynamic random access memory. A first command is issued to close the row and a second command is issued to open the additional row after the row is last accessed by the instruction.
-
7.
公开(公告)号:US20230244496A1
公开(公告)日:2023-08-03
申请号:US18132879
申请日:2023-04-10
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , Yasuko Eckert
IPC: G06F9/38 , G06F9/30 , G06F12/0815 , G06F13/16
CPC classification number: G06F9/3895 , G06F9/30105 , G06F12/0815 , G06F13/1668 , G06F9/30036
Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
-
8.
公开(公告)号:US11625251B1
公开(公告)日:2023-04-11
申请号:US17561112
申请日:2021-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , Yasuko Eckert
IPC: G06F9/38 , G06F9/30 , G06F12/0815 , G06F13/16
Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.
-
公开(公告)号:US11442727B2
公开(公告)日:2022-09-13
申请号:US16895825
申请日:2020-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos
Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.
-
公开(公告)号:US20200150966A1
公开(公告)日:2020-05-14
申请号:US16725203
申请日:2019-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Varun Agrawal , John Kalamatianos , Adithya Yalavarti , Jingjie Qian
IPC: G06F9/38
Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.
-
-
-
-
-
-
-
-
-