Variance analysis for translating CUDA code for execution by a general purpose processor
    3.
    发明授权
    Variance analysis for translating CUDA code for execution by a general purpose processor 有权
    用于翻译CUDA代码以供通用处理器执行的方差分析

    公开(公告)号:US08984498B2

    公开(公告)日:2015-03-17

    申请号:US12415090

    申请日:2009-03-31

    IPC分类号: G06F9/45

    摘要: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

    摘要翻译: 本发明的一个实施例提出了一种用于翻译使用并行编程模型编写的应用程序的技术,用于在多核图形处理单元(GPU)上执行以由通用中央处理单元(CPU)执行。 依赖于多核GPU的特定功能的应用程序的部分由翻译器转换,以供通用CPU执行。 应用程序被划分为独立于同步的指令的区域。 指令被分类为在区域之间共享的收敛或发散和不同的存储器引用。 插入线程循环以确保在通用CPU执行期间在不同线程之间正确共享内存。

    Method for Transforming a Multithreaded Program for General Execution
    4.
    发明申请
    Method for Transforming a Multithreaded Program for General Execution 有权
    用于转换多线程程序进行一般执行的方法

    公开(公告)号:US20120254875A1

    公开(公告)日:2012-10-04

    申请号:US13076258

    申请日:2011-03-30

    IPC分类号: G06F9/46

    CPC分类号: G06F8/72 G06F9/522

    摘要: A technique is disclosed for executing a program designed for multi-threaded operation on a general purpose processor. Original source code for the program is transformed from a multi-threaded structure into a computationally equivalent single-threaded structure. A transform operation modifies the original source code to insert code constructs for serial thread execution. The transform operation also replaces synchronization barrier constructs in the original source code with synchronization barrier code that is configured to facilitate serialization. The transformed source code may then be conventionally compiled and advantageously executed on the general purpose processor.

    摘要翻译: 公开了一种用于在通用处理器上执行针对多线程操作设计的程序的技术。 程序的原始源代码从多线程结构转换为计算等效的单线程结构。 转换操作修改原始源代码以插入用于串行线程执行的代码结构。 变换操作还用原始源代码中的同步屏障代码替代配置为便于序列化的同步屏障代码。 然后可以在通用处理器上常规地编译和有利地执行变换的源代码。

    Efficient construction of pruned SSA form
    6.
    发明申请
    Efficient construction of pruned SSA form 有权
    修剪SSA形式的有效建设

    公开(公告)号:US20050273777A1

    公开(公告)日:2005-12-08

    申请号:US10863000

    申请日:2004-06-07

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443

    摘要: Intermediate representations of computer code are efficiently generated. More particularly, methods described herein may be used to construct a static single assignment representation of computer code without unnecessary phi-function nodes. Potentially necessary phi-function node assignments may be analyzed to determine whether they directly reach a non-phi use or a necessary phi-use of a corresponding variable. Those that ultimately reach such a use may be determined to be necessary and a pruned static single assignment may be constructed by including those potentially necessary phi-functions determined to be in fact necessary. Also, some phi-function nodes may be determined to be necessary based on their dependency relationship to other phi-functions previously determined to be necessary (e.g., because they directly reach a non-phi use). A phi-function dependency graph may be used to record dependency relationships between phi-function nodes. The analysis can proceed during a forward walk of a control flow representation of the program.

    摘要翻译: 有效地生成计算机代码的中间表示。 更具体地,本文描述的方法可以用于构造计算机代码的静态单分配表示,而不需要不必要的功能节点。 可能分析潜在必需的phi函数节点分配,以确定它们是否直接达到非phi使用或相应变量的必要phi使用。 最终达到这种用途的那些可以被确定为必要的,并且修剪的静态单个分配可以通过包括被确定为实际需要的那些潜在必要的phi函数来构造。 此外,可以基于它们与先前确定为必需的其他phi函数的依赖关系(例如,因为它们直接达到非phi使用)而确定一些phi函数节点是必要的。 phi函数依赖图可用于记录phi函数节点之间的依赖关系。 分析可以在程序的控制流表示的向前行进期间进行。

    Performing mode switching in an unbounded transactional memory (UTM) system
    7.
    发明授权
    Performing mode switching in an unbounded transactional memory (UTM) system 有权
    在无界事务内存(UTM)系统中执行模式切换

    公开(公告)号:US08365016B2

    公开(公告)日:2013-01-29

    申请号:US13307492

    申请日:2011-11-30

    IPC分类号: G06F11/00

    摘要: In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed.

    摘要翻译: 在一个实施例中,本发明包括一种用于在具有多个事务执行模式的无界事务存储器(UTM)系统中选择开始第一事务的第一事务执行模式的方法。 这些事务执行模式包括在处理器的高速缓冲存储器内执行的硬件模式,使用处理器的事务硬件执行的硬件辅助模式以及软件缓冲器,以及在没有事务性硬件的情况下执行的软件事务存储器(STM)模式。 如果在STM模式下没有执行等待事务,则可以将第一事务执行模式选择为硬件模式的最高执行模式,否则可以选择较低的执行模式。 描述和要求保护其他实施例。

    EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR
    8.
    发明申请
    EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR 有权
    一般用途处理器执行缩略图处理程序加速代码

    公开(公告)号:US20090259828A1

    公开(公告)日:2009-10-15

    申请号:US12408559

    申请日:2009-03-20

    IPC分类号: G06F9/45 G06F9/30

    摘要: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

    摘要翻译: 本发明的一个实施例提出了一种用于翻译使用并行编程模型编写的应用程序的技术,用于在多核图形处理单元(GPU)上执行以由通用中央处理单元(CPU)执行。 依赖于多核GPU的特定功能的应用程序的部分由翻译器转换,以供通用CPU执行。 应用程序被划分为独立于同步的指令的区域。 指令被分类为在区域之间共享的收敛或发散和不同的存储器引用。 插入线程循环以确保在通用CPU执行期间在不同线程之间正确共享内存。

    Dynamic Compiler Parallelism Techniques
    10.
    发明申请
    Dynamic Compiler Parallelism Techniques 审中-公开
    动态编译器并行技术

    公开(公告)号:US20160011857A1

    公开(公告)日:2016-01-14

    申请号:US14602258

    申请日:2015-01-21

    IPC分类号: G06F9/45

    摘要: Compiler techniques for inline parallelism and re-targetable parallel runtime execution of logic iterators enables selection thereof from the source code or dynamically during the object code execution.

    摘要翻译: 用于内联并行性和逻辑迭代器的可重定向并行运行时执行的编译器技术使得能够在源代码中进行选择,或者在目标代码执行期间动态地进行选择。