Cooperative thread array reduction and scan operations

    公开(公告)号:US09417875B2

    公开(公告)日:2016-08-16

    申请号:US14025482

    申请日:2013-09-12

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR
    2.
    发明申请
    INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR 有权
    同步并行线程处理器中的间接功能调用指令

    公开(公告)号:US20130138926A1

    公开(公告)日:2013-05-30

    申请号:US13674890

    申请日:2012-11-12

    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

    Abstract translation: 间接分支指令将地址寄存器作为参数,以便为单指令多线程(SIMT)处理器架构提供间接函数调用能力。 间接分支指令用于实现间接函数调用,虚函数调用和switch语句,以提高处理性能,与使用连续的测试和分支链相比。

    Cooperative thread array reduction and scan operations

    公开(公告)号:US09830197B2

    公开(公告)日:2017-11-28

    申请号:US15238428

    申请日:2016-08-16

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Patent Agency Ranking