Device and method for accelerating matrix multiply operations

    公开(公告)号:US12124531B2

    公开(公告)日:2024-10-22

    申请号:US18297230

    申请日:2023-04-07

    CPC classification number: G06F17/16 G06F7/5324 G06F15/8007

    Abstract: A processing device including a plurality of clusters of processor cores and a method for use in the processing device is disclosed. Each processor core in a cluster of processor cores is in communication with the other processor cores in the cluster and at least one processor core of each cluster is in communication with at least a processor core of a different cluster of processor cores. Each processor core is configured to store a product of a portion of a first matrix and a first portion of a second matrix in the memory, and store a product of the portion of the first matrix and a second portion of the second matrix in the memory, where the second portion of the second matrix is received from a processor core in the cluster of processor cores.

    Device and method for accelerating matrix multiply operations

    公开(公告)号:US11640444B2

    公开(公告)日:2023-05-02

    申请号:US17208526

    申请日:2021-03-22

    Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

    HARDWARE-SOFTWARE COLLABORATIVE ADDRESS MAPPING SCHEME FOR EFFICIENT PROCESSING-IN-MEMORY SYSTEMS

    公开(公告)号:US20220276795A1

    公开(公告)日:2022-09-01

    申请号:US17745278

    申请日:2022-05-16

    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.

    Detecting execution hazards in offloaded operations

    公开(公告)号:US11188406B1

    公开(公告)日:2021-11-30

    申请号:US17218506

    申请日:2021-03-31

    Abstract: Detecting execution hazards in offloaded operations is disclosed. A second offload operation is compared to a first offload operation that precedes the second offload operation. It is determined whether the second offload operation creates an execution hazard on an offload target device based on the comparison of the second offload operation to the first offload operation. If the execution hazard is detected, an error handling operation may be performed. In some examples, the offload operations are processing-in-memory operations.

    Near-memory data reduction
    6.
    发明授权

    公开(公告)号:US11099788B2

    公开(公告)日:2021-08-24

    申请号:US16658733

    申请日:2019-10-21

    Abstract: An approach is provided for implementing near-memory data reduction during store operations to off-chip or off-die memory. A Near-Memory Reduction (NMR) unit provides near-memory data reduction during write operations to a specified address range. The NMR unit is configured with a range of addresses to be reduced and when a store operation specifies an address within the range of addresses, the NRM unit performs data reduction by adding the data value specified by the store operation to an accumulated reduction result. According to an embodiment, the NRM unit maintains a count of the number of updates to the accumulated reduction result that are used to determine when data reduction has been completed.

    Approach for enabling concurrent execution of host memory commands and near-memory processing commands

    公开(公告)号:US11977782B2

    公开(公告)日:2024-05-07

    申请号:US17855442

    申请日:2022-06-30

    CPC classification number: G06F3/0659 G06F3/0613 G06F3/0673

    Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.

    APPROACH FOR ENABLING CONCURRENT EXECUTION OF HOST MEMORY COMMANDS AND NEAR-MEMORY PROCESSING COMMANDS

    公开(公告)号:US20240004585A1

    公开(公告)日:2024-01-04

    申请号:US17855442

    申请日:2022-06-30

    CPC classification number: G06F3/0659 G06F3/0613 G06F3/0673

    Abstract: An approach allows concurrent execution of near-memory processing commands, referred to herein as “PIM commands,” and host memory commands. A memory controller determines and issues a plurality of register-only PIM commands that do not reference memory with host memory commands to allow concurrent execution of the register-only PIM commands and the host memory commands. The approach allows concurrent execution of register-only PIM commands and host memory commands without interference, even when the register-only PIM commands and the host memory commands are interleaved, and even for the same memory module, which improves resource utilization and performance. Further improvement of resource utilization and performance is achieved by extending a register-only phase by reordering register-only PIM commands before non-register-only PIM commands, subject to dependency constraints, and using shadow row buffers to provide local working copies of data from memory to near-memory compute elements.

    APPROACH FOR PROCESSING NEAR-MEMORY PROCESSING COMMANDS USING NEAR-MEMORY REGISTER DEFINITION DATA

    公开(公告)号:US20230409238A1

    公开(公告)日:2023-12-21

    申请号:US17845263

    申请日:2022-06-21

    CPC classification number: G06F3/0659 G06F3/0673 G06F3/0604

    Abstract: An approach is provided for processing near-memory processing commands, e.g., PIM commands, using PIM register definition data that defines multiple combinations of source and/or destination registers to be used to process PIM commands. A particular combination of source and/or destination registers to be used to process a PIM command is specified by the PIM command or determined by a near-memory processing element processing the PIM command. According to another implementation, the PIM register definition data specifies an initial combination of source and/or destination registers and one or more update functions for each PIM command. A near-memory processing element processes a PIM command using the initial combination of source and/or destination registers and uses the one or more update functions to update the combination of source and/or destination registers to be used the next time the PIM command is processed.

Patent Agency Ranking