Schedule Instructions of a Program of Data Flows for Execution in Tiles of a Coarse Grained Reconfigurable Array

    公开(公告)号:US20240362024A1

    公开(公告)日:2024-10-31

    申请号:US18770560

    申请日:2024-07-11

    IPC分类号: G06F9/30

    CPC分类号: G06F9/30181 G06F9/30043

    摘要: Schedule instructions of a program for execution on a coarse grained reconfigurable array having a plurality of tiles operable in parallel. The program identifies data flows through memory locations represented by memory variables and identifies instructions configured to transform data in the data flows. Based on a hardware profile identifying features of the coarse grained reconfigurable array, a scheduler is configured to generate a memory map. The memory map identifies, for each respective memory variable in the program, one of the tiles that contains a memory location represented by the respective memory variable. Based on the memory map reducing possible choices for a brute force search, the scheduler assigns the instructions to the tiles for execution, and determines timing of execution of the instructions in the tiles.

    Mixed scalar and vector operations in multi-threaded computing

    公开(公告)号:US12131157B2

    公开(公告)日:2024-10-29

    申请号:US17984336

    申请日:2022-11-10

    IPC分类号: G06F9/30 G06F9/32

    摘要: Processors, systems and methods are provided for thread level parallel processing. A processor may include a sequencer configured to: decode instructions that include scalar instructions and vector instructions, execute decoded scalar instructions, and package decoded vector instructions as configurations. The processor may further include a plurality of columns of vector processing units coupled to the sequencer. The plurality of columns of vector processing units may include a plurality of processing elements (PEs) and each of the PEs may include a plurality of Arithmetic Logic Units (ALUs). The sequencer may be configured to send the configurations to the plurality of columns of vector processing units.

    Cache coherence validation using delayed fulfillment of L2 requests

    公开(公告)号:US12118355B2

    公开(公告)日:2024-10-15

    申请号:US17506122

    申请日:2021-10-20

    IPC分类号: G06F9/30 G06F9/38 G06F12/0811

    摘要: Methods and systems for validating cache coherence in a data processing system are described. A processing element may detect a load instruction requesting the processing element to transfer data from a global memory location to a local memory location. The processing element may apply, in response to detecting the load instruction requesting the processing element to transfer data from the global memory location to the local memory location, a delay to the transfer of the data from the global memory location to the local memory location. The processing element may execute the load instruction and transferring the data from the global memory location to the local memory location with the applied delay. The processing element may validate, in response to executing the load instruction and transferring the data with the applied delay, a cache coherence of the data processing system.

    Distributed graphics processor unit architecture

    公开(公告)号:US12111789B2

    公开(公告)日:2024-10-08

    申请号:US16855879

    申请日:2020-04-22

    发明人: Dmitri Yudanov

    摘要: The present disclosure is directed to a distributed graphics processor unit (GPU) architecture that includes an array of processing nodes. Each processing node may include a GPU node that is coupled to its own fast memory unit and its own storage unit. The fast memory unit and storage unit may be integrated into a single unit or may be separately coupled to the GPU node. The processing node may have its fast memory unit coupled to both the GPU node and the storage node. The various architectures provide a GPU-based system that may be treated as a storage unit, such as solid state drive (SSD) that performs onboard processing to perform memory-oriented operations. In this respect, the system may be viewed as a “smart drive” for big-data near-storage processing.

    System call management in a user-mode, multi-threaded, self-scheduling processor

    公开(公告)号:US12106142B2

    公开(公告)日:2024-10-01

    申请号:US17337788

    申请日:2021-06-03

    发明人: Tony M. Brewer

    摘要: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute a received instruction; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In another embodiment, the core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, to reserve a predetermined amount of memory space in a thread control memory to store return arguments, and to generate one or more work descriptor data packets to another processor or hybrid threading fabric circuit for execution of a corresponding plurality of execution threads. Event processing, data path management, system calls, memory requests, and other new instructions are also disclosed.