Method and system for utilizing parallelism across loops
    41.
    发明授权
    Method and system for utilizing parallelism across loops 有权
    利用跨循环并行的方法和系统

    公开(公告)号:US08479185B2

    公开(公告)日:2013-07-02

    申请号:US12963786

    申请日:2010-12-09

    IPC分类号: G06F9/45

    CPC分类号: G06F8/452 G06F8/458

    摘要: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.

    摘要翻译: 一种用于编译应用程序源代码的方法,包括选择多个循环用于并行化。 多个循环包括第一循环和第二循环。 该方法还包括将第一循环划分成第一组块,将第二循环分成第二组块,以及计算第一组块与第二组块之间的数据依赖关系。 第二组块的第一块取决于第一组块的第一块。 该方法还包括在完成编译之后插入到第一循环中并且在完成编译之前,执行第一组块的第一块的执行的执行的先前同步指令,并且完成应用源代码的编译以创建编译的应用 码。

    METHOD AND SYSTEM FOR UTILIZING PARALLELISM ACROSS LOOPS
    42.
    发明申请
    METHOD AND SYSTEM FOR UTILIZING PARALLELISM ACROSS LOOPS 有权
    用于跨平台并行的方法和系统

    公开(公告)号:US20120151463A1

    公开(公告)日:2012-06-14

    申请号:US12963786

    申请日:2010-12-09

    IPC分类号: G06F9/45

    CPC分类号: G06F8/452 G06F8/458

    摘要: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.

    摘要翻译: 一种用于编译应用程序源代码的方法,包括选择多个循环用于并行化。 多个循环包括第一循环和第二循环。 该方法还包括将第一循环划分成第一组块,将第二循环分成第二组块,以及计算第一组块与第二组块之间的数据依赖关系。 第二组块的第一块取决于第一组块的第一块。 该方法还包括在完成编译之后插入到第一循环中并且在完成编译之前,执行第一组块的第一块的执行的执行的先前同步指令,并且完成编译应用源代码以创建应用编译 码。

    Facilitating communication and synchronization between main and scout threads
    43.
    发明申请
    Facilitating communication and synchronization between main and scout threads 有权
    促进主和侦察线程之间的通信和同步

    公开(公告)号:US20070022422A1

    公开(公告)日:2007-01-25

    申请号:US11272178

    申请日:2005-11-09

    IPC分类号: G06F9/46

    摘要: One embodiment of the present invention provides a system for communicating and performing synchronization operations between a main thread and a helper-thread. The system starts by executing a program in a main thread. Upon encountering a loop which has associated helper-thread code, the system commences the execution of the code by the helper-thread separately and in parallel with the main thread. While executing the code by the helper-thread, the system periodically checks the progress of the main thread and deactivates the helper-thread if the code being executed by the helper-thread is no longer performing useful work. Hence, the helper-thread is executes in advance of where the main thread is executing to prefetch data items for the main thread without unnecessarily consuming processor resources or hampering the execution of the main thread.

    摘要翻译: 本发明的一个实施例提供一种用于在主线程和辅助线程之间进行通信和执行同步操作的系统。 系统通过在主线程中执行程序来启动。 在遇到具有相关联的助手线程代码的循环时,系统通过辅助线程分别开始与主线程并行执行代码。 在由辅助线程执行代码的同时,如果由辅助线程执行的代码不再执行有用的工作,则系统将定期检查主线程的进度并停用辅助线程。 因此,辅助线程在主线程正在执行的地方执行以预取主线程的数据项,而不必耗费处理器资源或妨碍主线程的执行。

    Method and apparatus for software scouting regions of a program
    44.
    发明申请
    Method and apparatus for software scouting regions of a program 有权
    程序的软件侦察区域的方法和装置

    公开(公告)号:US20070022412A1

    公开(公告)日:2007-01-25

    申请号:US11272210

    申请日:2005-11-09

    IPC分类号: G06F9/45

    摘要: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

    摘要翻译: 本发明的一个实施例提供一种系统,其生成针对程序区域进行软件侦察的代码。 在运行期间,系统接收程序的源代码。 系统然后编译源代码。 在编译过程的第一步中,系统从源代码中的循环层级识别第一组循环,其中第一组循环中的每个循环包含至少一个有效预取候选。 然后,从第一组循环中,系统识别侦察模式预取有利可图的第二组循环。 接下来,对于第二组循环中的每个循环,系统为辅助线程生成可执行代码,其中包含每个有效预取候选的预取指令。 在运行时,辅助线程与主线程并行执行,主线程正在执行以预取主线程的数据项。

    Adjusting workload to accommodate speculative thread start-up cost
    45.
    发明授权
    Adjusting workload to accommodate speculative thread start-up cost 有权
    调整工作量以适应投机线程启动成本

    公开(公告)号:US08166486B2

    公开(公告)日:2012-04-24

    申请号:US11950121

    申请日:2007-12-04

    IPC分类号: G06F9/46

    摘要: Methods and apparatus provide for a workload adjuster to estimate the startup cost of one or more non-main threads of loop execution and to estimate the amount of workload to be migrated between different threads. Upon deciding to parallelize the execution of a loop, the workload adjuster creates a scheduling policy with a workload for a main thread and workloads for respective non-main threads. The scheduling policy distributes iterations of a parallelized loop to the workload of the main thread and iterations of the parallelized loop to the workloads of the non-main threads. The workload adjuster evaluates a start-up cost of the workload of a non-main thread and, based on the start-up cost, migrates a portion of the workload for that non-main thread to the main thread's workload.

    摘要翻译: 方法和装置提供工作负载调整器来估计循环执行的一个或多个非主线程的启动成本,并估计要在不同线程之间迁移的工作量。 在决定并行执行循环后,工作负载调整程序将为主线程创建一个调度策略,并为相应的非主线程创建工作负载。 调度策略将并行循环的迭代分配给主线程的工作负载,并行并行循环到非主线程的工作负载的迭代。 工作负载调整器评估非主线程的工作负载的启动成本,并且基于启动成本将该非主线程的一部分工作量迁移到主线程的工作负载。

    ADJUSTING WORKLOAD TO ACCOMMODATE SPECULATIVE THREAD START-UP COST
    46.
    发明申请
    ADJUSTING WORKLOAD TO ACCOMMODATE SPECULATIVE THREAD START-UP COST 有权
    调整工作负载以调节测量线程启动成本

    公开(公告)号:US20090144746A1

    公开(公告)日:2009-06-04

    申请号:US11950121

    申请日:2007-12-04

    IPC分类号: G06F9/46

    摘要: Methods and apparatus provide for a workload adjuster to estimate the startup cost of one or more non-main threads of loop execution and to estimate the amount of workload to be migrated between different threads. Upon deciding to parallelize the execution of a loop, the workload adjuster creates a scheduling policy with a workload for a main thread and workloads for respective non-main threads. The scheduling policy distributes iterations of a parallelized loop to the workload of the main thread and iterations of the parallelized loop to the workloads of the non-main threads. The workload adjuster evaluates a start-up cost of the workload of a non-main thread and, based on the start-up cost, migrates a portion of the workload for that non-main thread to the main thread's workload.

    摘要翻译: 方法和装置提供工作负载调整器来估计循环执行的一个或多个非主线程的启动成本,并估计要在不同线程之间迁移的工作量。 在决定并行执行循环后,工作负载调整程序将为主线程创建一个调度策略,并为相应的非主线程创建工作负载。 调度策略将并行循环的迭代分配给主线程的工作负载,并行并行循环到非主线程的工作负载的迭代。 工作负载调整器评估非主线程的工作负载的启动成本,并且基于启动成本将该非主线程的一部分工作量迁移到主线程的工作负载。

    Pipelined loop parallelization with pre-computations
    47.
    发明授权
    Pipelined loop parallelization with pre-computations 有权
    流水线循环并行化与预先计算

    公开(公告)号:US08726251B2

    公开(公告)日:2014-05-13

    申请号:US13074253

    申请日:2011-03-29

    IPC分类号: G06F9/45 G06F9/38

    摘要: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.

    摘要翻译: 本发明的实施例提供了用预先计算所选值的自动并行化循环的系统和方法,所述循环具有非推测的流水线执行的迭代块。 识别非DOALL循环并将循环分成块。 这些块被分配到单独的逻辑线程,可以进一步分配给硬件线程。 当线程执行其运行时计算时,后续线程尝试预先计算它们各自的循环块。 这些预先计算可以产生一组假定的初始值和与每个块相关联的预先计算的最终变量值。 随着运行时间达到后续预计算的块,可以验证这些假定的初始值,以确定是否继续进行块的运行时计算,或者避免运行时执行,而是使用预先计算的最终变量值。

    COMPILING MULTI-THREADED APPLICATIONS FOR TARGETED CRITICALITIES
    48.
    发明申请
    COMPILING MULTI-THREADED APPLICATIONS FOR TARGETED CRITICALITIES 有权
    编写针对特定关键字的多线程应用程序

    公开(公告)号:US20130326473A1

    公开(公告)日:2013-12-05

    申请号:US13485176

    申请日:2012-05-31

    IPC分类号: G06F9/44

    摘要: Methods are disclosed of compiling a software application having multiple functions. At least one of the functions is identified as a targeted function having a significant contribution to performance of the software application. A code version of the targeted function is generated with one of multiple machine models corresponding to different target utilizations for a target architecture, specifically corresponding to the one with the greatest of the different target utilizations. The generated code version of the targeted function is matched with an application thread of the target architecture.

    摘要翻译: 公开了编译具有多个功能的软件应用的方法。 将至少一个功能识别为对软件应用的性能有重要贡献的目标函数。 使用与目标架构的不同目标利用相对应的多个机器模型之一生成目标函数的代码版本,具体对应于具有最大不同目标利用率的机器模型。 目标函数的生成代码版本与目标体系结构的应用程序线程相匹配。

    THROUGHPUT-AWARE SOFTWARE PIPELINING FOR HIGHLY MULTI-THREADED SYSTEMS
    49.
    发明申请
    THROUGHPUT-AWARE SOFTWARE PIPELINING FOR HIGHLY MULTI-THREADED SYSTEMS 有权
    用于高度多线程系统的通用软件管道

    公开(公告)号:US20130111453A1

    公开(公告)日:2013-05-02

    申请号:US13285891

    申请日:2011-10-31

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452

    摘要: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule). Additional attempts are made to generate pipelined code in satisfaction of the criteria by iteratively adjusting the settings, regenerating the code using the adjusted settings, and recalculating whether the code satisfies the criteria.

    摘要翻译: 本发明的实施例提供用于编译器中的吞吐量感知软件流水线的系统和方法,以在多线程系统上产生用于单线程和多线程执行的最佳代码。 在源代码中将循环识别为软件流水线的候选。 尝试生成流水线代码(例如,生成指令调度和一组寄存器分配),以满足吞吐量感知流水线标准,如最大寄存器数,最小跳闸次数,目标内核管道资源利用率,最大值 代码大小等。如果尝试不能产生满足标准的代码,则实施例调整一个或多个设置(例如,通过减少用于产生指令调度的标量或延迟设置)。 进一步尝试通过迭代地调整设置来生成流水线代码以满足标准,使用调整后的设置重新生成代码,并重新计算代码是否满足标准。

    Enhanced parallelism in trace scheduling by using renaming
    50.
    发明授权
    Enhanced parallelism in trace scheduling by using renaming 有权
    通过重命名增强跟踪调度的并行性

    公开(公告)号:US06948162B2

    公开(公告)日:2005-09-20

    申请号:US10043772

    申请日:2002-01-09

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445

    摘要: A method includes scheduling instructions within a trace disregarding data dependencies from off trace basic blocks. After scheduling, errors caused by instruction movement are corrected. By disregarding data dependencies from off trace basic blocks, more parallelism is exposed resulting in more instruction motion. In this manner, efficiency is maximized.

    摘要翻译: 一种方法包括调度跟踪中的指令,忽略来自离线跟踪基本块的数据依赖性。 调度后,由指令移动引起的错误得到纠正。 通过忽略来自跟踪基本块的数据依赖性,更多的并行性被暴露,导致更多的指令运动。 以这种方式,效率最大化。