BINDING CONSTANTS AT RUNTIME FOR IMPROVED RESOURCE UTILIZATION

    公开(公告)号:US20190146817A1

    公开(公告)日:2019-05-16

    申请号:US15897090

    申请日:2018-02-14

    Abstract: A just-in-time (JIT) compiler binds constants to specific memory locations at runtime. The JIT compiler parses program code derived from a multithreaded application and identifies an instruction that references a uniform constant. The JIT compiler then determines a chain of pointers that originates within a root table specified in the multithreaded application and terminates at the uniform constant. The JIT compiler generates additional instructions for traversing the chain of pointers and inserts these instructions into the program code. A parallel processor executes this compiled code and, in doing so, causes a thread to traverse the chain of pointers and bind the uniform constant to a uniform register at runtime. Each thread in a group of threads executing on the parallel processor may then access the uniform constant.

    UNIFORM REGISTER FILE FOR IMPROVED RESOURCE UTILIZATION

    公开(公告)号:US20190146796A1

    公开(公告)日:2019-05-16

    申请号:US15897092

    申请日:2018-02-14

    Abstract: A compiler parses a multithreaded application into cohesive blocks of instructions. Cohesive blocks include instructions that do not diverge or converge. Each cohesive block is associated with one or more uniform registers. When a set of threads executes the instructions in a given cohesive block, each thread in the set may access the uniform register independently of the other threads in the set. Accordingly, the uniform register may store a single copy of data on behalf of all threads in the set of threads, thereby conserving resources.

    SELF-SYNCHRONIZING REMOTE MEMORY OPERATIONS IN A MULTIPROCESSOR SYSTEM

    公开(公告)号:US20240393951A1

    公开(公告)日:2024-11-28

    申请号:US18768983

    申请日:2024-07-10

    Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.

    Efficient Matrix Multiply and Add with a Group of Warps

    公开(公告)号:US20230289398A1

    公开(公告)日:2023-09-14

    申请号:US17691406

    申请日:2022-03-10

    CPC classification number: G06F17/16 G06F9/3001 G06F7/5443

    Abstract: This specification describes techniques for implementing matrix multiply and add (MMA) operations in graphics processing units (GPU)s and other processors. The implementations provide for a plurality of warps of threads to collaborate in generating the result matrix by enabling each thread to share its respective register files to be accessed by the datapaths associated with other threads in the group of warps. A state machine circuit controls a MMA execution among the warps executing on asynchronous computation units. A group MMA (GMMA) instruction provides for a descriptor to be provided as parameter where the descriptor may include information regarding size and formats of input data to be loaded into shared memory and/or the datapath.

Patent Agency Ranking