PREFETCHING FUNCTIONALITY ON A LOGIC DIE STACKED WITH MEMORY
    42.
    发明申请
    PREFETCHING FUNCTIONALITY ON A LOGIC DIE STACKED WITH MEMORY 审中-公开
    在与存储器堆叠的逻辑芯片上的预制功能

    公开(公告)号:US20140181415A1

    公开(公告)日:2014-06-26

    申请号:US13723285

    申请日:2012-12-21

    CPC classification number: G06F12/0862

    Abstract: Prefetching functionality on a logic die stacked with memory is described herein. A device includes a logic chip stacked with a memory chip. The logic chip includes a control block, an in-stack prefetch request handler and a memory controller. The control block receives memory requests from an external source and determines availability of the requested data in the in-stack prefetch request handler. If the data is available, the control block sends the requested data to the external source. If the data is not available, the control block obtains the requested data via the memory controller. The in-stack prefetch request handler includes a prefetch controller, a prefetcher and a prefetch buffer. The prefetcher monitors the memory requests and based on observed patterns, issues additional prefetch requests to the memory controller.

    Abstract translation: 本文描述了在与存储器堆叠的逻辑管芯上的预取功能。 一种器件包括堆叠有存储器芯片的逻辑芯片。 逻辑芯片包括控制块,堆叠预取请求处理程序和存储器控制器。 控制块从外部源接收存储器请求,并确定栈内预取请求处理程序中所请求数据的可用性。 如果数据可用,则控制块将所请求的数据发送到外部源。 如果数据不可用,则控制块通过存储器控制器获得所请求的数据。 栈内预取请求处理程序包括预取控制器,预取器和预取缓冲区。 预取器监视存储器请求并基于观察到的模式,向存储器控制器发出额外的预取请求。

    Device and method for accelerating matrix multiply operations

    公开(公告)号:US12124531B2

    公开(公告)日:2024-10-22

    申请号:US18297230

    申请日:2023-04-07

    CPC classification number: G06F17/16 G06F7/5324 G06F15/8007

    Abstract: A processing device including a plurality of clusters of processor cores and a method for use in the processing device is disclosed. Each processor core in a cluster of processor cores is in communication with the other processor cores in the cluster and at least one processor core of each cluster is in communication with at least a processor core of a different cluster of processor cores. Each processor core is configured to store a product of a portion of a first matrix and a first portion of a second matrix in the memory, and store a product of the portion of the first matrix and a second portion of the second matrix in the memory, where the second portion of the second matrix is received from a processor core in the cluster of processor cores.

    Device and method for accelerating matrix multiply operations

    公开(公告)号:US11640444B2

    公开(公告)日:2023-05-02

    申请号:US17208526

    申请日:2021-03-22

    Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

    HARDWARE-SOFTWARE COLLABORATIVE ADDRESS MAPPING SCHEME FOR EFFICIENT PROCESSING-IN-MEMORY SYSTEMS

    公开(公告)号:US20220276795A1

    公开(公告)日:2022-09-01

    申请号:US17745278

    申请日:2022-05-16

    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.

    Near-memory data reduction
    50.
    发明授权

    公开(公告)号:US11099788B2

    公开(公告)日:2021-08-24

    申请号:US16658733

    申请日:2019-10-21

    Abstract: An approach is provided for implementing near-memory data reduction during store operations to off-chip or off-die memory. A Near-Memory Reduction (NMR) unit provides near-memory data reduction during write operations to a specified address range. The NMR unit is configured with a range of addresses to be reduced and when a store operation specifies an address within the range of addresses, the NRM unit performs data reduction by adding the data value specified by the store operation to an accumulated reduction result. According to an embodiment, the NRM unit maintains a count of the number of updates to the accumulated reduction result that are used to determine when data reduction has been completed.

Patent Agency Ranking