HARDWARE-SOFTWARE COLLABORATIVE ADDRESS MAPPING SCHEME FOR EFFICIENT PROCESSING-IN-MEMORY SYSTEMS

    公开(公告)号:US20220066662A1

    公开(公告)日:2022-03-03

    申请号:US17006646

    申请日:2020-08-28

    Abstract: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.

    COMMAND THROUGHPUT IN PIM-ENABLED MEMORY USING AVAILABLE DATA BUS BANDWIDTH

    公开(公告)号:US20210373805A1

    公开(公告)日:2021-12-02

    申请号:US16885677

    申请日:2020-05-28

    Abstract: An approach is provided for reducing command bus traffic between memory controllers and PIM-enabled memory modules using special PIM commands. The term “special PIM command” is used herein to describe embodiments and refers to a PIM command for which the corresponding module-specific command information is provided to memory modules via a non-command bus data path. A memory controller generates and issues a special PIM command to multiple PIM-enabled memory modules via a command bus and provides module-specific command information (e.g., address information) for the special PIM command to the PIM-enabled memory modules via the non-command bus data path that is shared by the PIM-enabled memory modules and the memory controller.

    Locality-aware and sharing-aware cache coherence for collections of processors

    公开(公告)号:US11119923B2

    公开(公告)日:2021-09-14

    申请号:US15440979

    申请日:2017-02-23

    Abstract: A cache coherence technique for operating a multi-processor system including shared memory includes allocating a cache line of a cache memory of a processor to a memory address in the shared memory in response to execution of an instruction of a program executing on the processor. The technique includes encoding a shared information state of the cache line to indicate whether the memory address is a shared memory address shared by the processor and a second processor, or a private memory address private to the processor, in response to whether the instruction is included in a critical section of the program, the critical section being a portion of the program that confines access to shared, writeable data.

    Protecting host memory from access by untrusted accelerators

    公开(公告)号:US11030117B2

    公开(公告)日:2021-06-08

    申请号:US15650252

    申请日:2017-07-14

    Abstract: A host processor receives an address translation request from an accelerator, which may be trusted or un-trusted. The address translation request includes a virtual address in a virtual address space that is shared by the host processor and the accelerator. The host processor encrypts a physical address in a host memory indicated by the virtual address in response to the accelerator being permitted to access the physical address. The host processor then provides the encrypted physical address to the accelerator. The accelerator provides memory access requests including the encrypted physical address to the host processor, which decrypts the physical address and selectively accesses a location in the host memory indicated by the decrypted physical address depending upon whether the accelerator is permitted to access the location indicated by the decrypted physical address.

    Data processing system with decoupled data operations

    公开(公告)号:US10817422B2

    公开(公告)日:2020-10-27

    申请号:US16104567

    申请日:2018-08-17

    Abstract: In one form, a data processing system includes a host integrated circuit having a memory controller, a memory bus coupled to the memory controller, and a memory module. The memory module includes a bulk memory and a memory module scratchpad coupled to the bulk memory, wherein the memory module scratchpad has a lower access overhead than the bulk memory. The memory controller selectively provides predetermined commands over the memory bus to cause the memory module to copy data between the bulk memory and the memory module scratchpad without conducting data on the memory bus in response to a data movement decision.

    NEAR-MEMORY HARDENED COMPUTE BLOCKS FOR CONFIGURABLE COMPUTING SUBSTRATES

    公开(公告)号:US20190220426A1

    公开(公告)日:2019-07-18

    申请号:US15872943

    申请日:2018-01-16

    CPC classification number: G06F13/1694 G06F3/0631 G06F3/065 G06F3/068

    Abstract: A configurable computing system which uses near-memory and in-memory hardened logic blocks is described herein. The hardened logic blocks are incorporated into memory modules. The memory modules include an interface or communication logic to communicate between the configurable computing substrate and the memory module. In an implementation, the memory modules can include an on-die memory or other forms of non-configurable logic to enable more efficient processing for a variety of operations. In another implementation, the memory modules can include a portion of configurable computing substrate logic fabric to enable more efficient processing for a variety of operations. In another implementation, the memory modules can include an on-die memory and a portion of configurable computing substrate logic fabric to enable more efficient processing for a variety of operations.

    DYNAMIC HARDWARE SELECTION FOR EXPERTS IN MIXTURE-OF-EXPERTS MODEL

    公开(公告)号:US20190188577A1

    公开(公告)日:2019-06-20

    申请号:US15849633

    申请日:2017-12-20

    CPC classification number: G06N5/022 G06F7/02 G06N20/00

    Abstract: A system assigns experts of a mixture-of-experts artificial intelligence model to processing devices in an automated manner. The system includes an orchestrator component that maintains priority data that stores, for each of a set of experts, and for each of a set of execution parameters, ranking information that ranks different processing devices for the particular execution parameter. In one example, for the execution parameter of execution speed, and for a first expert, the priority data indicates that a central processing unit (“CPU”) executes the first expert faster than a graphics processing unit (“GPU”). In this example, for the execution parameter of power consumption, and for the first expert, the priority data indicates that a GPU uses less power than a CPU. The priority data stores such information for one or more processing devices, one or more experts, and one or more execution characteristics.

    Dynamic memory remapping to reduce row-buffer conflicts

    公开(公告)号:US10198369B2

    公开(公告)日:2019-02-05

    申请号:US15469071

    申请日:2017-03-24

    Abstract: A data processing system includes a memory that includes a first memory bank and a second memory bank. The data processing system also includes a conflict detector connected to the memory and adapted to receive memory access information. The conflict detector tracks memory access statistics of the first memory bank, and determines if the first memory bank contains frequent row conflicts. The conflict detector also remaps a frequent row conflict in the first memory bank to the second memory bank. An indirection table is connected to the conflict detector and adapted to receive a memory access request, and redirects an address into a dynamically selected physical memory address in response to a remapping of the frequent row conflict to the second memory bank.

Patent Agency Ranking