Distributed coherence directory subsystem with exclusive data regions

    公开(公告)号:US11726915B2

    公开(公告)日:2023-08-15

    申请号:US16821632

    申请日:2020-03-17

    CPC classification number: G06F12/0824 G06F12/084

    Abstract: A processing system includes a first set of one or more processing units including a first processing unit, a second set of one or more processing units including a second processing unit, and a memory having an address space shared by the first and second sets. The processing system further includes a distributed coherence directory subsystem having a first coherence directory to support a first subset of one or more address regions of the address space and a second coherence directory to support a second subset of one or more address regions of the address space. In some implementations, the first coherence directory is implemented in the system so as to have a lower access latency for the first set, whereas the second coherence directory is implemented in the system so as to have a lower access latency for the second set.

    Processing unit with small footprint arithmetic logic unit

    公开(公告)号:US11720328B2

    公开(公告)日:2023-08-08

    申请号:US17029836

    申请日:2020-09-23

    CPC classification number: G06F7/57 G06F17/16 G06N3/08

    Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.

    Apparatus and methods for managing packet transfer across a memory fabric physical layer interface

    公开(公告)号:US11720279B2

    公开(公告)日:2023-08-08

    申请号:US16701794

    申请日:2019-12-03

    Abstract: An apparatus and method for managing packet transfer between a memory fabric having a physical layer interface higher data rate than a data rate of a physical layer interface of another device, receives incoming packets from the memory fabric physical layer interface wherein at least some of the packets include different instruction types. The apparatus and method determine a packet type of the incoming packet received from the memory fabric physical layer interface and when the determined incoming packet type is of a type containing an atomic request, the method and apparatus prioritizes transfer of the incoming packet with the atomic request over other packet types of incoming packets, to memory access logic that accesses local memory within an apparatus.

    DEVICE AND METHOD FOR ACCELERATING MATRIX MULTIPLY OPERATIONS

    公开(公告)号:US20230244751A1

    公开(公告)日:2023-08-03

    申请号:US18297230

    申请日:2023-04-07

    CPC classification number: G06F17/16 G06F7/5324 G06F15/8007

    Abstract: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.

    PRESERVING MEMORY ORDERING BETWEEN OFFLOADED INSTRUCTIONS AND NON-OFFLOADED INSTRUCTIONS

    公开(公告)号:US20230244492A1

    公开(公告)日:2023-08-03

    申请号:US18298723

    申请日:2023-04-11

    CPC classification number: G06F9/3836 G06F9/3001 G06F9/522 G06F9/3877

    Abstract: Preserving memory ordering between offloaded instructions and non-offloaded instructions is disclosed. An offload instruction for an operation to be offloaded is processed and a lock is placed on a memory address associated with the offload instruction. In response to completing a cache operation targeting the memory address, the lock on the memory address is removed. For multithreaded applications, upon determining that a plurality of processor cores have each begun executing a sequence of offload instructions, the execution of non-offload instructions that are younger than any of the offload instructions is restricted. In response to determining that each processor core has completed executing its sequence of offload instructions, the restriction is removed. The remote device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.

Patent Agency Ranking