Method and apparatus for inserting prefetch instructions in an optimizing compiler
    1.
    发明授权
    Method and apparatus for inserting prefetch instructions in an optimizing compiler 有权
    在优化编译器中插入预取指令的方法和装置

    公开(公告)号:US07257810B2

    公开(公告)日:2007-08-14

    申请号:US10052999

    申请日:2001-11-02

    IPC分类号: G06F9/44 G06F9/45 G06F9/30

    CPC分类号: G06F8/4442

    摘要: One embodiment of the present invention provides a system that generates code to perform anticipatory prefetching for data references. During operation, the system receives code to be executed on a computer system. Next, the system analyzes the code to identify data references to be prefetched. This analysis can involve: using a two-phase marking process in which blocks that are certain to execute are considered before other blocks; and analyzing complex array subscripts. Next, the system inserts prefetch instructions into the code in advance of the identified data references. This insertion can involve: dealing with non-constant or unknown stride values; moving prefetch instructions into preceding basic blocks; and issuing multiple prefetches for the same data reference.

    摘要翻译: 本发明的一个实施例提供一种生成用于对数据引用进行预期预取的代码的系统。 在操作期间,系统接收要在计算机系统上执行的代码。 接下来,系统分析代码以识别要预取的数据引用。 该分析可以涉及:使用两相标记过程,其中确定执行的块在其他块之前被考虑; 并分析复杂的数组下标。 接下来,系统在预先识别的数据引用之前将预取指令插入到代码中。 该插入可以涉及:处理非常数或未知步幅值; 将预取指令移动到前面的基本块中; 并为相同的数据引用发出多个预取。

    Method and apparatus for selecting references for prefetching in an optimizing compiler
    2.
    发明授权
    Method and apparatus for selecting references for prefetching in an optimizing compiler 有权
    用于在优化编译器中选择用于预取的参考的方法和装置

    公开(公告)号:US07234136B2

    公开(公告)日:2007-06-19

    申请号:US10052997

    申请日:2001-11-02

    IPC分类号: G06F9/45 G06F15/00

    摘要: One embodiment of the present invention provides a system that generates code to perform anticipatory prefetching for data references. During operation, the system receives code to be executed on a computer system. Next, the system analyzes the code to identify data references to be prefetched. This analysis can involve: using a two-phase marking process in which blocks that are certain to execute are considered before other blocks; and analyzing complex array subscripts. Next, the system inserts prefetch instructions into the code in advance of the identified data references. This insertion can involve: dealing with non-constant or unknown stride values; moving prefetch instructions into preceding basic blocks; and issuing multiple prefetches for the same data reference.

    摘要翻译: 本发明的一个实施例提供一种生成用于对数据引用进行预期预取的代码的系统。 在操作期间,系统接收要在计算机系统上执行的代码。 接下来,系统分析代码以识别要预取的数据引用。 该分析可以涉及:使用两相标记过程,其中确定执行的块在其他块之前被考虑; 并分析复杂的数组下标。 接下来,系统在预先识别的数据引用之前将预取指令插入到代码中。 该插入可以涉及:处理非常数或未知步幅值; 将预取指令移动到前面的基本块中; 并为相同的数据引用发出多个预取。

    Method and apparatus for optimizing computer program performance using steered execution
    3.
    发明授权
    Method and apparatus for optimizing computer program performance using steered execution 有权
    使用转向执行优化计算机程序性能的方法和装置

    公开(公告)号:US07458067B1

    公开(公告)日:2008-11-25

    申请号:US11084656

    申请日:2005-03-18

    IPC分类号: G06F9/44

    CPC分类号: G06F8/443

    摘要: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.

    摘要翻译: 本发明的一个实施例提供了一种通过使用转向执行来有助于优化计算机程序性能的系统。 该系统首先接收计算机程序的源代码,然后用第一组优化来编译该源代码的一部分以生成第一编译部分。 该系统还使用第二组优化来编译源代码的相同部分以生成第二编译部分。 编译剩余源代码以生成第三编译部分。 另外,生成用于在第一编译部分和第二编译部分之间进行选择的规则。 最后,将第一编译部分,第二编译部分,第三编译部分和规则组合成可执行输出文件。

    Anticipatory helper thread based code execution
    4.
    发明申请
    Anticipatory helper thread based code execution 有权
    基于预期的助手线程代码执行

    公开(公告)号:US20070271565A1

    公开(公告)日:2007-11-22

    申请号:US11436948

    申请日:2006-05-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843 G06F9/52

    摘要: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.

    摘要翻译: 一种在计算系统中使用线程的方法和机制。 多线程计算系统被配置为执行第一线程和第二线程。 响应于检测功能的发起点的第一线程,第一线程被配置为向第二线程提供指示第二线程可以开始执行给定功能的指示。 该功能的启动点在执行顺序中的函数的实际调用点之前。 第二线程被配置为响应于该指示来启动该功能的执行。 该功能包括一个或多个输入,第二线程使用一个或多个输入中的每一个的预期值。 当第一线程到达功能的调用点时,第一线程被配置为使用第二线程的执行结果,以响应于确定第二线程使用的预期值是正确的。

    Compiler implementation of lock/unlock using hardware transactional memory
    5.
    发明授权
    Compiler implementation of lock/unlock using hardware transactional memory 有权
    使用硬件事务内存的编译器实现锁定/解锁

    公开(公告)号:US08612929B2

    公开(公告)日:2013-12-17

    申请号:US12331950

    申请日:2008-12-10

    IPC分类号: G06F9/44 G06F9/45

    摘要: A system and method for automatic efficient parallelization of code combined with hardware transactional memory support. A software application may contain a transaction synchronization region (TSR) utilizing lock and unlock transaction synchronization function calls for a shared region of memory within a shared memory. The TSR is replaced with two portions of code. The first portion comprises hardware transactional memory primitives in place of lock and unlock function calls. Also, the first portion ensures no other transaction is accessing the shared region without disabling existing hardware transactional memory support. The second portion performs a fail routine, which utilizes lock and unlock transaction synchronization primitives in response to an indication that a failure occurs within said first portion.

    摘要翻译: 用于自动高效并行化代码并结合硬件事务内存支持的系统和方法。 软件应用可以包含利用对共享存储器内的共享存储器区域的锁定和解锁事务同步功能调用的事务同步区域(TSR)。 TSR被替换为两部分代码。 第一部分包括代替锁定和解锁功能调用的硬件事务存储器原语。 此外,第一部分确保没有其他事务访问共享区域,而不会禁用现有的硬件事务内存支持。 第二部分执行故障例程,其响应于在所述第一部分内发生故障的指示,利用锁定和解锁事务同步原语。

    Runtime profitability control for speculative automatic parallelization
    6.
    发明授权
    Runtime profitability control for speculative automatic parallelization 有权
    投机自动并行化的运行时获利控制

    公开(公告)号:US08359587B2

    公开(公告)日:2013-01-22

    申请号:US12113706

    申请日:2008-05-01

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold. Additionally, changing execution from one version to the other may be in further response to determining an execution time of the parallelized version of the candidate code is greater than an execution time of the non-parallelized version of the candidate code.

    摘要翻译: 用于并行化程序代码的编译方法和机制。 一种编译方法包括分析源代码和识别用于并行化的候选代码。 响应于确定所述获利能力满足预定标准,该方法包括并行化候选代码; 并产生与源代码相对应的目标代码。 生成的目标代码包括候选代码的非并行化版本和候选代码的并行化版本。 在执行对象代码期间,进行候选代码的非并行化版本的执行与候选代码的并行化版本之间的动态选择。 将候选代码的所述并行化版本的执行改变为候选代码的非并行化版本可以响应于确定事务故障计数满足预定阈值。 此外,将执行从一个版本改变到另一版本可能进一步响应于确定候选代码的并行化版本的执行时间大于候选代码的非并行化版本的执行时间。

    Method and apparatus for generating efficient code for scout thread to prefetch data values for a main thread
    7.
    发明申请
    Method and apparatus for generating efficient code for scout thread to prefetch data values for a main thread 审中-公开
    用于生成用于侦察线程的有效代码以预取主线程的数据值的方法和装置

    公开(公告)号:US20120226892A1

    公开(公告)日:2012-09-06

    申请号:US11081984

    申请日:2005-03-16

    IPC分类号: G06F9/38 G06F12/08

    CPC分类号: G06F9/3851 G06F9/383

    摘要: One embodiment of the present invention provides a system that generates code for a scout thread to prefetch data values for a main thread. During operation, the system compiles source code for a program to produce executable code for the program. This compilation process involves performing reuse analysis to identify prefetch candidates which are likely to be touched during execution of the program. Additionally, this compilation process produces executable code for the scout thread which contains prefetch instructions to prefetch the identified prefetch candidates for the main thread. In this way, the scout thread can subsequently be executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

    摘要翻译: 本发明的一个实施例提供一种系统,其生成侦察线程的代码以预取主线程的数据值。 在操作期间,系统编译程序的源代码,以生成程序的可执行代码。 该编译过程涉及执行重用分析以识别在执行程序期间可能被触摸的预取候选。 此外,该编译过程产生用于侦察线程的可执行代码,其包含预取指令以预取主线程的所识别的预取候选。 以这种方式,侦察线程随后可以在主线程正在执行以预取主线程的数据项之前与主线程并行执行。

    Facilitating communication and synchronization between main and scout threads
    8.
    发明授权
    Facilitating communication and synchronization between main and scout threads 有权
    促进主和侦察线程之间的通信和同步

    公开(公告)号:US07950012B2

    公开(公告)日:2011-05-24

    申请号:US11272178

    申请日:2005-11-09

    IPC分类号: G06F9/46 G06F9/38

    摘要: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.

    摘要翻译: 本发明的一个实施例提供一种用于在主线程和辅助线程之间进行通信和执行同步操作的系统。 系统通过在主线程中执行程序来启动。 在遇到具有相关联的助手线程代码的循环时,系统通过辅助线程分别开始与主线程并行执行代码。 在由辅助线程执行代码的同时,如果由辅助线程执行的代码不再执行有用的工作,则系统将定期检查主线程的进度并停用辅助线程。 因此,辅助线程在主线程正在执行的地方执行以预取主线程的数据项,而不必耗费处理器资源或妨碍主线程的执行。

    FAULT TOLERANT COMPILATION WITH AUTOMATIC OPTIMIZATION ADJUSTMENT
    9.
    发明申请
    FAULT TOLERANT COMPILATION WITH AUTOMATIC OPTIMIZATION ADJUSTMENT 有权
    具有自动优化调整的容错编译

    公开(公告)号:US20100325619A1

    公开(公告)日:2010-12-23

    申请号:US12488905

    申请日:2009-06-22

    IPC分类号: G06F9/45

    摘要: A compilation method is provided for correcting compiler errors that include compiler internal errors and errors produced by running a validation suite. The method includes running a compiler on a computer and storing a set of optimization levels in memory accessible by the compiler. The method includes receiving a source file with the compiler that includes a user-defined optimization level to be used in compiling the source file. The method includes identifying a set of functions within the source file and using compiler components to compile these functions using the original optimization level. When the compiling results in an internal error occurring and being reported for one or more of the functions, the method includes using an optimization adjustment module to process the internal error and assign an adjusted or lower optimization level to the one or more functions and recompiling of these functions again with the lower optimization level.

    摘要翻译: 提供了一种编译方法,用于纠正编译器错误,包括编译器内部错误和运行验证套件产生的错误。 该方法包括在计算机上运行编译器,并将一组优化级别存储在编译器可访问的存储器中。 该方法包括接收包含用于编译源文件的用户定义的优化级别的编译器的源文件。 该方法包括识别源文件中的一组函数,并使用编译器组件使用原始优化级别编译这些函数。 当编译导致内部错误发生并被报告给一个或多个功能时,该方法包括使用优化调整模块来处理内部错误并且将调整的或较低的优化级别分配给一个或多个功能并重新编译 这些功能再次具有较低的优化级别。

    Method and system for generating prefetch information for multi-block indirect memory access chains
    10.
    发明授权
    Method and system for generating prefetch information for multi-block indirect memory access chains 有权
    用于生成多块间接存储器访问链的预取信息的方法和系统

    公开(公告)号:US07383402B2

    公开(公告)日:2008-06-03

    申请号:US11446643

    申请日:2006-06-05

    IPC分类号: G06F12/00

    CPC分类号: G06F8/4442

    摘要: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.

    摘要翻译: 为多块间接存储器访问链生成预取信息。 一种方法可以包括选择一个过程的间接存储器访问链,该链包括在该过程内不依赖于其地址对另一预取候选存储器访问的头访问以及取决于其在头访问上的地址的间接访问 。 该方法还可以包括确定链的预取预取值,以及生成与头访问相对应的加载操作,其指定依赖于预提取值和头访问的地址的目标存储器地址。 该方法还可以包括:对于链的终端间接访问,以与其对应的终端间接访问相同的方式生成依赖于其对先前加载操作的结果的地址计算的相应预取操作取决于链中的先前访问 。