Condensed Coherence Directory Entries for Processing-in-Memory

    公开(公告)号:US20240211402A1

    公开(公告)日:2024-06-27

    申请号:US18146904

    申请日:2022-12-27

    CPC classification number: G06F12/0817 G06F2212/1016

    Abstract: In accordance with the described techniques for condensed coherence directory entries for processing in memory, a computing device includes a core that includes a cache, a memory that includes multiple banks, a coherence directory that includes a condensed entry indicating that data associated with a memory address and the multiple banks is not stored in the cache, and a cache coherence controller. The cache coherence controller receives a processing-in-memory command to the memory address and performs a single lookup in the coherence directory for the processing-in-memory command based on inclusion of the condensed entry in the coherence directory.

    REUSING REMOTE REGISTERS IN PROCESSING IN MEMORY

    公开(公告)号:US20220206685A1

    公开(公告)日:2022-06-30

    申请号:US17139496

    申请日:2020-12-31

    Abstract: Systems, apparatuses, and methods for reusing remote registers in processing in memory (PIM) are disclosed. A system includes at least a host processor, a memory controller, and a PIM device. When the memory controller receives, from the host processor, an operation targeting the PIM device, the memory controller determines whether an optimization can be applied to the operation. The memory controller converts the operation into N PIM commands if the optimization is not applicable. Otherwise, the memory controller converts the operation into a N−1 PIM commands if the optimization is applicable. For example, if the operation involves reusing a constant value, a copy command can be omitted, resulting in memory bandwidth reduction and power consumption savings. In one scenario, the memory controller includes a constant-value cache, and the memory controller performs a lookup of the constant-value cache to determine if the optimization is applicable for a given operation.

    Controlling accesses to a branch prediction unit for sequences of fetch groups

    公开(公告)号:US10853075B2

    公开(公告)日:2020-12-01

    申请号:US16725203

    申请日:2019-12-23

    Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.

    Mechanism for reducing coherence directory controller overhead for near-memory compute elements

    公开(公告)号:US12008378B2

    公开(公告)日:2024-06-11

    申请号:US18132879

    申请日:2023-04-10

    Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

    Controlling Prediction Functional Blocks Used by a Branch Predictor in a Processor

    公开(公告)号:US20210382718A1

    公开(公告)日:2021-12-09

    申请号:US16895825

    申请日:2020-06-08

    Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.

    DRAM Row Management for Processing in Memory

    公开(公告)号:US20240004584A1

    公开(公告)日:2024-01-04

    申请号:US17855109

    申请日:2022-06-30

    CPC classification number: G06F3/0659 G06F3/0653 G06F3/0679 G06F3/0604

    Abstract: In accordance with described techniques for DRAM row management for processing in memory, a plurality of instructions are obtained for execution by a processing in memory component embedded in a dynamic random access memory. An instruction is identified that last accesses a row of the dynamic random access memory, and a subsequent instruction is identified that first accesses an additional row of the dynamic random access memory. A first command is issued to close the row and a second command is issued to open the additional row after the row is last accessed by the instruction.

    MECHANISM FOR REDUCING COHERENCE DIRECTORY CONTROLLER OVERHEAD FOR NEAR-MEMORY COMPUTE ELEMENTS

    公开(公告)号:US20230244496A1

    公开(公告)日:2023-08-03

    申请号:US18132879

    申请日:2023-04-10

    Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

    Mechanism for reducing coherence directory controller overhead for near-memory compute elements

    公开(公告)号:US11625251B1

    公开(公告)日:2023-04-11

    申请号:US17561112

    申请日:2021-12-23

    Abstract: A parallel processing (PP) level coherence directory, also referred to as a Processing In-Memory Probe Filter (PimPF), is added to a coherence directory controller. When the coherence directory controller receives a broadcast PIM command from a host, or a PIM command that is directed to multiple memory banks in parallel, the PimPF accelerates processing of the PIM command by maintaining a directory for cache coherence that is separate from existing system level directories in the coherence directory controller. The PimPF maintains a directory according to address signatures that define the memory addresses affected by a broadcast PIM command. Two implementations are described: a lightweight implementation that accelerates PIM loads into registers, and a heavyweight implementation that accelerates both PIM loads into registers and PIM stores into memory.

    Controlling prediction functional blocks used by a branch predictor in a processor

    公开(公告)号:US11442727B2

    公开(公告)日:2022-09-13

    申请号:US16895825

    申请日:2020-06-08

    Abstract: An electronic device includes a processor, a branch predictor in the processor, and a predictor controller in the processor. The branch predictor includes multiple prediction functional blocks, each prediction functional block configured for generating predictions for control transfer instructions (CTIs) in program code based on respective prediction information, the branch predictor configured to select, from among predictions generated by the prediction functional blocks for each CTI, a selected prediction to be used for that CTI. The predictor controller keeps a record of prediction functional blocks from which the branch predictor previously selected predictions for CTIs. The predictor controller uses information from the record for controlling which prediction functional blocks are used by the branch predictor for generating predictions for CTIs.

    Controlling Accesses to a Branch Prediction Unit for Sequences of Fetch Groups

    公开(公告)号:US20200150966A1

    公开(公告)日:2020-05-14

    申请号:US16725203

    申请日:2019-12-23

    Abstract: An electronic device handles accesses of a branch prediction functional block when executing instructions in program code. The electronic device includes a processor having the branch prediction functional block that provides branch prediction information for control transfer instructions (CTIs) in the program code and a minimum predictor use (MPU) functional block. The MPU functional block determines, based on a record associated with a given fetch group of instructions, that a specified number of subsequent fetch groups of instructions that were previously determined to include no CTIs or conditional CTIs that were not taken are to be fetched for execution in sequence following the given fetch group. The MPU functional block then, when each of the specified number of the subsequent fetch groups is fetched and prepared for execution, prevents corresponding accesses of the branch prediction functional block for acquiring branch prediction information for instructions in that subsequent fetch group.

Patent Agency Ranking