Method and system for utilizing parallelism across loops
    1.
    发明授权
    Method and system for utilizing parallelism across loops 有权
    利用跨循环并行的方法和系统

    公开(公告)号:US08479185B2

    公开(公告)日:2013-07-02

    申请号:US12963786

    申请日:2010-12-09

    IPC分类号: G06F9/45

    CPC分类号: G06F8/452 G06F8/458

    摘要: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.

    摘要翻译: 一种用于编译应用程序源代码的方法,包括选择多个循环用于并行化。 多个循环包括第一循环和第二循环。 该方法还包括将第一循环划分成第一组块,将第二循环分成第二组块,以及计算第一组块与第二组块之间的数据依赖关系。 第二组块的第一块取决于第一组块的第一块。 该方法还包括在完成编译之后插入到第一循环中并且在完成编译之前,执行第一组块的第一块的执行的执行的先前同步指令,并且完成应用源代码的编译以创建编译的应用 码。

    Method and apparatus for optimizing computer program performance using steered execution
    2.
    发明授权
    Method and apparatus for optimizing computer program performance using steered execution 有权
    使用转向执行优化计算机程序性能的方法和装置

    公开(公告)号:US07458067B1

    公开(公告)日:2008-11-25

    申请号:US11084656

    申请日:2005-03-18

    IPC分类号: G06F9/44

    CPC分类号: G06F8/443

    摘要: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.

    摘要翻译: 本发明的一个实施例提供了一种通过使用转向执行来有助于优化计算机程序性能的系统。 该系统首先接收计算机程序的源代码,然后用第一组优化来编译该源代码的一部分以生成第一编译部分。 该系统还使用第二组优化来编译源代码的相同部分以生成第二编译部分。 编译剩余源代码以生成第三编译部分。 另外,生成用于在第一编译部分和第二编译部分之间进行选择的规则。 最后,将第一编译部分,第二编译部分,第三编译部分和规则组合成可执行输出文件。

    Anticipatory helper thread based code execution
    3.
    发明申请
    Anticipatory helper thread based code execution 有权
    基于预期的助手线程代码执行

    公开(公告)号:US20070271565A1

    公开(公告)日:2007-11-22

    申请号:US11436948

    申请日:2006-05-18

    IPC分类号: G06F9/46

    CPC分类号: G06F9/4843 G06F9/52

    摘要: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.

    摘要翻译: 一种在计算系统中使用线程的方法和机制。 多线程计算系统被配置为执行第一线程和第二线程。 响应于检测功能的发起点的第一线程,第一线程被配置为向第二线程提供指示第二线程可以开始执行给定功能的指示。 该功能的启动点在执行顺序中的函数的实际调用点之前。 第二线程被配置为响应于该指示来启动该功能的执行。 该功能包括一个或多个输入,第二线程使用一个或多个输入中的每一个的预期值。 当第一线程到达功能的调用点时,第一线程被配置为使用第二线程的执行结果,以响应于确定第二线程使用的预期值是正确的。

    Microprocessor having a page prefetch cache for database applications
    4.
    发明授权
    Microprocessor having a page prefetch cache for database applications 有权
    微处理器具有用于数据库应用程序的页预取缓存

    公开(公告)号:US06848028B1

    公开(公告)日:2005-01-25

    申请号:US09477868

    申请日:2000-01-05

    IPC分类号: G06F12/08 G06F12/00

    摘要: A microprocessor cache configuration for reducing database cache misses and improving the processing speed, comprising a level-1 data cache, and a page prefetch cache. The page prefetch cache is adjacent the level-1 data cache. The page prefetch cache is configured to receive and store one or more database pages. Additionally, a page prefetch instruction provides the database pages to the page prefetch cache. The page prefetch instructions are generated by a compiler or by developer software.

    摘要翻译: 用于减少数据库高速缓存的微处理器高速缓存配置错过并提高处理速度,包括一级数据高速缓存和一页预取缓存。 页面预取缓存与一级数据高速缓存相邻。 页面预取缓存被配置为接收和存储一个或多个数据库页面。 另外,页面预取指令将数据库页面提供给页面预取缓存。 页面预取指令由编译器或开发人员软件生成。

    Method for employing a page prefetch cache for database applications
    5.
    发明授权
    Method for employing a page prefetch cache for database applications 有权
    为数据库应用程序采用页面预取缓存的方法

    公开(公告)号:US06829680B1

    公开(公告)日:2004-12-07

    申请号:US09477867

    申请日:2000-01-05

    IPC分类号: G06F1200

    摘要: A method for increasing the processing speed of database instructions using a page prefetch cache. More particularly, the method is executed on a microprocessor and reduces database cache misses and improves the processing speed. The method comprises enabling a page prefetch cache with a database application, issuing one or more page prefetch instructions, and determining whether the particular database page is in the page prefetch cache.

    摘要翻译: 一种使用页面预取缓存来提高数据库指令的处理速度的方法。 更具体地,该方法在微处理器上执行并且减少数据库高速缓存未命中并提高处理速度。 该方法包括:启用具有数据库应用程序的页面预取缓存,发出一个或多个页面预取指令,以及确定特定数据库页面是否在页面预取高速缓存中。

    Method and apparatus for instruction scheduling in an optimizing
compiler for minimizing overhead instructions
    6.
    发明授权
    Method and apparatus for instruction scheduling in an optimizing compiler for minimizing overhead instructions 失效
    用于最优化编译器中用于最小化开销指令的指令调度的方法和装置

    公开(公告)号:US5835776A

    公开(公告)日:1998-11-10

    申请号:US560089

    申请日:1995-11-17

    IPC分类号: G06F9/38 G06F9/45 G06F9/44

    CPC分类号: G06F8/4452

    摘要: Apparatus and methods are disclosed for scheduling target program instructions during the code optimization pass of an optimizing compiler. Most modern microprocessors have the ability to issue multiple instructions in one clock cycle and/or possess multiple pipelined functional units. They also have the ability to add two values to form the address within memory load and store instructions. In such microprocessors this invention can, where applicable, accelerate the execution of modulo-scheduled loops. The invention consists of a technique to achieve this speed up by systematically reducing the number of certain overhead instructions in modulo scheduled loops. The technique involves identifying reducible overhead instructions, scheduling the balance of the instructions with normal modulo scheduling procedures and then judiciously inserting no more than three copies of the reducible instructions into the schedule.

    摘要翻译: 公开了用于在优化编译器的代码优化过程期间调度目标程序指令的装置和方法。 大多数现代微处理器具有在一个时钟周期内发出多个指令和/或具有多个流水线功能单元的能力。 他们还可以添加两个值来形成内存加载和存储指令中的地址。 在这样的微处理器中,本发明可以在适用的情况下加速模数调度循环的执行。 本发明包括通过系统地减少模数调度循环中某些开销指令的数量来实现该加速的技术。 该技术涉及识别可缩减的开销指令,用正常的模调度程序调度指令的平衡,然后明智地将不多于3个可还原指令的副本插入到进度表中。

    Compiler implementation of lock/unlock using hardware transactional memory
    7.
    发明授权
    Compiler implementation of lock/unlock using hardware transactional memory 有权
    使用硬件事务内存的编译器实现锁定/解锁

    公开(公告)号:US08612929B2

    公开(公告)日:2013-12-17

    申请号:US12331950

    申请日:2008-12-10

    IPC分类号: G06F9/44 G06F9/45

    摘要: A system and method for automatic efficient parallelization of code combined with hardware transactional memory support. A software application may contain a transaction synchronization region (TSR) utilizing lock and unlock transaction synchronization function calls for a shared region of memory within a shared memory. The TSR is replaced with two portions of code. The first portion comprises hardware transactional memory primitives in place of lock and unlock function calls. Also, the first portion ensures no other transaction is accessing the shared region without disabling existing hardware transactional memory support. The second portion performs a fail routine, which utilizes lock and unlock transaction synchronization primitives in response to an indication that a failure occurs within said first portion.

    摘要翻译: 用于自动高效并行化代码并结合硬件事务内存支持的系统和方法。 软件应用可以包含利用对共享存储器内的共享存储器区域的锁定和解锁事务同步功能调用的事务同步区域(TSR)。 TSR被替换为两部分代码。 第一部分包括代替锁定和解锁功能调用的硬件事务存储器原语。 此外,第一部分确保没有其他事务访问共享区域,而不会禁用现有的硬件事务内存支持。 第二部分执行故障例程,其响应于在所述第一部分内发生故障的指示,利用锁定和解锁事务同步原语。

    Runtime profitability control for speculative automatic parallelization
    8.
    发明授权
    Runtime profitability control for speculative automatic parallelization 有权
    投机自动并行化的运行时获利控制

    公开(公告)号:US08359587B2

    公开(公告)日:2013-01-22

    申请号:US12113706

    申请日:2008-05-01

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold. Additionally, changing execution from one version to the other may be in further response to determining an execution time of the parallelized version of the candidate code is greater than an execution time of the non-parallelized version of the candidate code.

    摘要翻译: 用于并行化程序代码的编译方法和机制。 一种编译方法包括分析源代码和识别用于并行化的候选代码。 响应于确定所述获利能力满足预定标准,该方法包括并行化候选代码; 并产生与源代码相对应的目标代码。 生成的目标代码包括候选代码的非并行化版本和候选代码的并行化版本。 在执行对象代码期间,进行候选代码的非并行化版本的执行与候选代码的并行化版本之间的动态选择。 将候选代码的所述并行化版本的执行改变为候选代码的非并行化版本可以响应于确定事务故障计数满足预定阈值。 此外,将执行从一个版本改变到另一版本可能进一步响应于确定候选代码的并行化版本的执行时间大于候选代码的非并行化版本的执行时间。

    Method and apparatus for generating efficient code for scout thread to prefetch data values for a main thread
    9.
    发明申请
    Method and apparatus for generating efficient code for scout thread to prefetch data values for a main thread 审中-公开
    用于生成用于侦察线程的有效代码以预取主线程的数据值的方法和装置

    公开(公告)号:US20120226892A1

    公开(公告)日:2012-09-06

    申请号:US11081984

    申请日:2005-03-16

    IPC分类号: G06F9/38 G06F12/08

    CPC分类号: G06F9/3851 G06F9/383

    摘要: One embodiment of the present invention provides a system that generates code for a scout thread to prefetch data values for a main thread. During operation, the system compiles source code for a program to produce executable code for the program. This compilation process involves performing reuse analysis to identify prefetch candidates which are likely to be touched during execution of the program. Additionally, this compilation process produces executable code for the scout thread which contains prefetch instructions to prefetch the identified prefetch candidates for the main thread. In this way, the scout thread can subsequently be executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

    摘要翻译: 本发明的一个实施例提供一种系统,其生成侦察线程的代码以预取主线程的数据值。 在操作期间,系统编译程序的源代码,以生成程序的可执行代码。 该编译过程涉及执行重用分析以识别在执行程序期间可能被触摸的预取候选。 此外,该编译过程产生用于侦察线程的可执行代码,其包含预取指令以预取主线程的所识别的预取候选。 以这种方式,侦察线程随后可以在主线程正在执行以预取主线程的数据项之前与主线程并行执行。

    Facilitating communication and synchronization between main and scout threads
    10.
    发明授权
    Facilitating communication and synchronization between main and scout threads 有权
    促进主和侦察线程之间的通信和同步

    公开(公告)号:US07950012B2

    公开(公告)日:2011-05-24

    申请号:US11272178

    申请日:2005-11-09

    IPC分类号: G06F9/46 G06F9/38

    摘要: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.

    摘要翻译: 本发明的一个实施例提供一种用于在主线程和辅助线程之间进行通信和执行同步操作的系统。 系统通过在主线程中执行程序来启动。 在遇到具有相关联的助手线程代码的循环时,系统通过辅助线程分别开始与主线程并行执行代码。 在由辅助线程执行代码的同时,如果由辅助线程执行的代码不再执行有用的工作,则系统将定期检查主线程的进度并停用辅助线程。 因此,辅助线程在主线程正在执行的地方执行以预取主线程的数据项,而不必耗费处理器资源或妨碍主线程的执行。