System, method, and computer program product for managing divergences and synchronization points during thread block execution by using a double sided queue for token storage
    1.
    发明授权
    System, method, and computer program product for managing divergences and synchronization points during thread block execution by using a double sided queue for token storage 有权
    系统,方法和计算机程序产品,用于通过使用用于令牌存储的双面队列来管理线程块执行期间的分歧和同步点

    公开(公告)号:US09459876B2

    公开(公告)日:2016-10-04

    申请号:US13945842

    申请日:2013-07-18

    CPC classification number: G06F9/38 G06F9/30087 G06F9/3009 G06F9/3851 G06F9/524

    Abstract: A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token.

    Abstract translation: 公开了一种用于确保在单指令多数据(SIMD)架构中实现发散操作的线程向前进展的系统,方法和计算机程序产品。 该方法包括以下步骤:将队列数据结构分配给包括多个线程的线程块,确定当前指令指定收益率操作,将令牌推送到队列数据结构的第二侧,禁止在该队列数据结构中的任何活动线程 线程块,从队列数据结构的第一侧弹出下一个挂起的令牌,以及根据包括在下一个未决令牌中的掩码激活线程块中的一个或多个线程。

    SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COOPERATIVE MULTI-THREADING FOR VECTOR THREADS
    2.
    发明申请
    SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COOPERATIVE MULTI-THREADING FOR VECTOR THREADS 有权
    用于矢量螺纹合成多线程的系统,方法和计算机程序产品

    公开(公告)号:US20150026438A1

    公开(公告)日:2015-01-22

    申请号:US13945842

    申请日:2013-07-18

    CPC classification number: G06F9/38 G06F9/30087 G06F9/3009 G06F9/3851 G06F9/524

    Abstract: A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token.

    Abstract translation: 公开了一种用于确保在单指令多数据(SIMD)架构中实现发散操作的线程向前进展的系统,方法和计算机程序产品。 该方法包括以下步骤:将队列数据结构分配给包括多个线程的线程块,确定当前指令指定收益率操作,将令牌推送到队列数据结构的第二侧,禁止在该队列数据结构中的任何活动线程 线程块,从队列数据结构的第一侧弹出下一个挂起的令牌,以及根据包括在下一个未决令牌中的掩码激活线程块中的一个或多个线程。

    System, method, and computer program product for bulk synchronous binary program translation and optimization
    3.
    发明授权
    System, method, and computer program product for bulk synchronous binary program translation and optimization 有权
    用于批量同步二进制程序转换和优化的系统,方法和计算机程序产品

    公开(公告)号:US09207919B2

    公开(公告)日:2015-12-08

    申请号:US14158749

    申请日:2014-01-17

    CPC classification number: G06F8/41 G06F9/30087 G06F9/30181 G06F9/45516

    Abstract: A system, method, and computer program product are provided for. The method includes the steps of executing a block of translated binary instructions by multiple threads and gathering profiling data during execution of the block of translated binary instructions. The multiple threads are then synchronized at a barrier instruction associated with the block of translated binary instructions and the block of translated binary instructions is replaced with optimized binary instructions, where the optimized binary instructions are produced based on the profiling data.

    Abstract translation: 提供了一种系统,方法和计算机程序产品。 该方法包括以下步骤:通过多个线程执行翻译的二进制指令块,并且在执行翻译的二进制指令块期间收集分析数据。 然后,多个线程在与翻译的二进制指令块相关联的障碍指令处同步,并且转换的二进制指令块被替换为优化的二进制指令,其中基于分析数据产生优化的二进制指令。

    SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR BULK SYNCHRONOUS BINARY PROGRAM TRANSLATION AND OPTIMIZATION
    4.
    发明申请
    SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR BULK SYNCHRONOUS BINARY PROGRAM TRANSLATION AND OPTIMIZATION 有权
    用于大容量同步二进制程序翻译和优化的系统,方法和计算机程序产品

    公开(公告)号:US20150205586A1

    公开(公告)日:2015-07-23

    申请号:US14158749

    申请日:2014-01-17

    CPC classification number: G06F8/41 G06F9/30087 G06F9/30181 G06F9/45516

    Abstract: A system, method, and computer program product are provided for. The method includes the steps of executing a block of translated binary instructions by multiple threads and gathering profiling data during execution of the block of translated binary instructions. The multiple threads are then synchronized at a barrier instruction associated with the block of translated binary instructions and the block of translated binary instructions is replaced with optimized binary instructions, where the optimized binary instructions are produced based on the profiling data.

    Abstract translation: 提供了一种系统,方法和计算机程序产品。 该方法包括以下步骤:通过多个线程执行翻译的二进制指令块,并且在执行翻译的二进制指令块期间收集分析数据。 然后,多个线程在与翻译的二进制指令块相关联的障碍指令处同步,并且转换的二进制指令块被替换为优化的二进制指令,其中基于分析数据产生优化的二进制指令。

    EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER
    6.
    发明申请
    EXECUTION OF DIVERGENT THREADS USING A CONVERGENCE BARRIER 审中-公开
    使用综合障碍物执行多余的螺旋线

    公开(公告)号:US20160019066A1

    公开(公告)日:2016-01-21

    申请号:US14798265

    申请日:2015-07-13

    Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

    Abstract translation: 公开了一种使用会聚障碍来执行发散线程的方法,系统和计算机程序产品。 程序中的第一指令由多个线程执行,其中当特定线程执行时,第一指令向调度器单元指示线程参与会聚障碍。 通过程序的第一路径由参与线程的第一发散部分执行,并且通过程序的第二路径由参与线程的第二发散部分执行。 参与线程的第一发散部分执行程序中的第二条指令,并在会聚障碍处转变为阻塞状态。 调度器单元确定所有参与线程在会聚障碍处被同步,并且会聚障碍被清除。

Patent Agency Ranking