Cooperative thread array reduction and scan operations

    公开(公告)号:US09417875B2

    公开(公告)日:2016-08-16

    申请号:US14025482

    申请日:2013-09-12

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS
    4.
    发明申请
    COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS 有权
    合作螺线减排和扫描作业

    公开(公告)号:US20160357560A1

    公开(公告)日:2016-12-08

    申请号:US15238428

    申请日:2016-08-16

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

    Abstract translation: 本发明的一个实施例提出了一种用于跨独立执行的多个线程执行聚合操作的技术。 聚合被指定为屏障同步或屏障到达指令的一部分,其中除了执行屏障同步或到达之外,指令聚合(使用缩减或扫描操作)由每个线程提供的值。 当线程执行屏障聚合指令时,线程有助于扫描或缩小结果,并等待执行任何更多指令,直到所有线程都执行了阻挡聚合指令为止。 在所有线程执行了屏障聚合指令之后,向每个线程传递减少结果,并且当线程执行屏障聚合指令时,将扫描结果传送给每个线程。

    INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR
    5.
    发明申请
    INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR 有权
    同步并行线程处理器中的间接功能调用指令

    公开(公告)号:US20130138926A1

    公开(公告)日:2013-05-30

    申请号:US13674890

    申请日:2012-11-12

    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

    Abstract translation: 间接分支指令将地址寄存器作为参数,以便为单指令多线程(SIMT)处理器架构提供间接函数调用能力。 间接分支指令用于实现间接函数调用,虚函数调用和switch语句,以提高处理性能,与使用连续的测试和分支链相比。

    Cooperative thread array reduction and scan operations

    公开(公告)号:US09830197B2

    公开(公告)日:2017-11-28

    申请号:US15238428

    申请日:2016-08-16

    Abstract: One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Patent Agency Ranking