Mechanism to restrict parallelization of loops
    21.
    发明申请
    Mechanism to restrict parallelization of loops 失效
    限制环路并行化的机制

    公开(公告)号:US20070169057A1

    公开(公告)日:2007-07-19

    申请号:US11314456

    申请日:2005-12-21

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452

    摘要: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.

    摘要翻译: 计算机实现的方法,计算机可用程序代码和用于并行化循环的系统。 确定用于限制环路并联的参数,以限制环路的并行化。 该参数指定线程应执行的最小循环迭代次数。 该参数可以根据并行性能因素进行调整。 并行性能因素是影响并行代码性能的因素。 选择来自多个线程的多个线程用于基于该参数来处理该循环的迭代。 在执行循环的第一次迭代之前选择线程数。

    Structure and method for managing workshares in a parallel region
    22.
    发明申请
    Structure and method for managing workshares in a parallel region 审中-公开
    并行区管理工作的结构和方法

    公开(公告)号:US20050080981A1

    公开(公告)日:2005-04-14

    申请号:US10845553

    申请日:2004-05-13

    CPC分类号: G06F9/5066

    摘要: A data processing system is adapted to execute at least one workshare construct in a parallel region. The data processing system uses at least one thread for executing a corresponding subsection of the workshare construct and provides control blocks for managing corresponding workshare constructs in the parallel region. A method of managing the control blocks comprises: adding an array of control blocks to a control block queue; assigning control blocks in the initialized array to corresponding workshare constructs in the parallel region until a barrier is reached; and waiting at the barrier for all threads in the parallel region to complete their corresponding subsections and then resetting the control block to the beginning of the control block queue. Also provided are a computer program product and a data processing system for implementing the method.

    摘要翻译: 数据处理系统适于在并行区域中执行至少一个作业分配构造。 数据处理系统使用至少一个线程来执行工作共享结构的相应子部分,并且提供用于管理并行区域中的相应作业分配构造的控制块。 一种管理控制块的方法包括:将一组控制块添加到控制块队列; 将初始化的数组中的控制块分配给并行区域中的相应的工作区构造,直到达到屏障; 并且在并行区域中的所有线程等待屏障以完成其对应的子部分,然后将控制块重置为控制块队列的开头。 还提供了一种用于实现该方法的计算机程序产品和数据处理系统。

    Software compiler generated threaded environment
    23.
    发明授权
    Software compiler generated threaded environment 有权
    软件编译器生成线程环境

    公开(公告)号:US09218186B2

    公开(公告)日:2015-12-22

    申请号:US13223486

    申请日:2011-09-01

    IPC分类号: G06F9/30 G06F9/38 G06F9/45

    摘要: A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.

    摘要翻译: 用于从软件编译器生成的代码创建计算机可执行指令的螺纹包的计算机实现的方法包括通过计算机处理器将计算机可执行指令分配到多个堆栈中,区分用于每个计算机可执行指令的不同类型的计算机可执行指令 分配给多个堆叠的每个堆叠,基于区分来为多个堆叠的每个堆叠创建切换点,并且将切换点插入到多个堆叠的每个堆栈内。

    Method and apparatus for emulating stream clock signal in asynchronous data transmission
    24.
    发明授权
    Method and apparatus for emulating stream clock signal in asynchronous data transmission 有权
    用于在异步数据传输中仿真流时钟信号的方法和装置

    公开(公告)号:US09112625B2

    公开(公告)日:2015-08-18

    申请号:US12819471

    申请日:2010-06-21

    IPC分类号: H04L7/00 H04J3/06

    CPC分类号: H04J3/0632

    摘要: A method and apparatus for emulating stream clock signal in asynchronous data transmission. The inventive subject matter proposes a system consisting of a transmitter module, a receiver module, and a link or network in between. A scheme to generate the emulated stream clock across a wide frequency range is also proposed with the property of controllable deviation from the original stream frequency to meet jitter requirement and fast frequency convergence (minimal number of converging steps). The scheme includes an optional first step to derive a frequency estimation of the stream clock and a second step of continuous adjusting the emulated clock frequency to keep the average frequency equals that of the original stream clock.

    摘要翻译: 一种在异步数据传输中仿真流时钟信号的方法和装置。 本发明主题提出了一种由发射机模块,接收机模块以及其间的链路或网络组成的系统。 还提出了在宽频率范围内生成仿真流时钟的方案,其具有与原始流频率的可控偏差的特性,以满足抖动要求和快速频率收敛(最小收敛步数)。 该方案包括用于导出流时钟的频率估计的可选的第一步骤,以及连续调整仿真时钟频率以保持平均频率等于原始时钟的平均频率的第二步骤。

    Auto parallelization of zero-trip loops through the induction variable substitution
    25.
    发明授权
    Auto parallelization of zero-trip loops through the induction variable substitution 失效
    通过感应变量替代自动并联零跳闸回路

    公开(公告)号:US08375375B2

    公开(公告)日:2013-02-12

    申请号:US12356978

    申请日:2009-01-21

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443 G06F8/452

    摘要: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

    摘要翻译: 提供了通过利用并行化编译器代替嵌套的基本线性感应变量的零跳行循环自动并行化的方法和系统。 提供了对于从1到N迭代的典型循环,在没有关于N的值的信息的情况下,使用max {0,N}变量进行循环迭代,其中N是循环不变量。 对于嵌套的基本感应变量,将诱导变量替换过程应用于从最内循环到最外层循环的嵌套循环。 然后,通过IBM编译器的复制传播传递,随后删除最大运算符。 在这样做时,消除了对感应变量的循环依赖性,并且提供并行化编译器并行最外层循环的机会。

    Virtual memory protocol segmentation offloading
    26.
    发明授权
    Virtual memory protocol segmentation offloading 有权
    虚拟内存协议分段卸载

    公开(公告)号:US07944946B2

    公开(公告)日:2011-05-17

    申请号:US12254931

    申请日:2008-10-21

    IPC分类号: H04L12/56

    摘要: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.

    摘要翻译: 提供了更有效地传输网络流量的方法和系统。 根据一个实施例,提供了一种用于执行诸如TCP分段卸载(TSO)的分段卸载的方法。 接口代表网络处理器执行对系统存储器的用户存储器空间的直接虚拟存储器寻址,以提取由主机处理器上运行的用户进程发起的有效载荷数据。 然后,网络处理器通过一个或多个分组分段有效载荷数据。

    Promotion of a Child Procedure in Heterogeneous Architecture Software
    27.
    发明申请
    Promotion of a Child Procedure in Heterogeneous Architecture Software 有权
    促进异构体系结构软件中的子程序

    公开(公告)号:US20100235811A1

    公开(公告)日:2010-09-16

    申请号:US12400840

    申请日:2009-03-10

    IPC分类号: G06F9/44 G06F9/45

    CPC分类号: G06F8/52

    摘要: A method for promotion of a child procedure in a software application for a heterogeneous architecture, wherein the heterogeneous architecture comprises a first architecture type and a second architecture type, comprises inserting a parameter representing a parallel frame pointer to a parent procedure of the child procedure into the child procedure; and modifying a reference in the child procedure to a stack variable of the parent procedure to include an indirect access to the parent procedure via the parallel frame pointer.

    摘要翻译: 一种用于在异构架构的软件应用程序中促进子程序的方法,其中异构架构包括第一架构类型和第二架构类型,包括将表示并行帧指针的参数插入到子程序的父过程中 子程序; 以及将子过程中的引用修改为父过程的堆栈变量,以通过并行帧指针包括对父过程的间接访问。

    Open multi-processing reduction implementation in cell broadband engine (CBE) single source compiler
    28.
    发明授权
    Open multi-processing reduction implementation in cell broadband engine (CBE) single source compiler 有权
    单元宽带引擎(CBE)单源编译器开放多处理降低实现

    公开(公告)号:US07689977B1

    公开(公告)日:2010-03-30

    申请号:US12423894

    申请日:2009-04-15

    IPC分类号: G06F9/45

    CPC分类号: G06F8/314 G06F8/45

    摘要: The present disclosure is directed to a method for providing an OpenMP reduction implementation. The method may comprise creating an aggregate of at least one reduction variable in a parallel region or a work-sharing construct; defining a pointer variable, the pointer variable pointing to a dynamic array of the aggregate; creating an initialization routine, an outlined routine and a reduction accumulation routine; replacing the parallel region or the work-sharing construct with a runtime routine, the runtime routine taking a plurality of arguments including an address of the initialization routine, an address of the outlined routine, an address of the reduction accumulation routine, an address of the pointer variable, and a size of the aggregate; and executing the runtime routine when the at least one reduction variable is in the parallel region or the work-sharing construct.

    摘要翻译: 本公开涉及一种用于提供OpenMP缩减实现的方法。 该方法可以包括在并行区域或工作共享构造中创建至少一个减少变量的聚合; 定义指针变量,指针变量指向聚合的动态数组; 创建初始化例程,概述例程和减少累积程序; 用运行时程序替换并行区域或工作共享结构,运行时程序采取多个参数,包括初始化例程的地址,概述的例程的地址,减少累积程序的地址, 指针变量和聚合体的大小; 以及当所述至少一个缩减变量在并行区域或所述工作共享构造中时执行所述运行时程序。

    Distributed counter and centralized sensor in barrier wait synchronization
    29.
    发明授权
    Distributed counter and centralized sensor in barrier wait synchronization 失效
    分布式计数器和集中式传感器在障碍物等待同步中

    公开(公告)号:US07487501B2

    公开(公告)日:2009-02-03

    申请号:US10929165

    申请日:2004-08-30

    IPC分类号: G06F9/46 G06F1/00

    CPC分类号: G06F9/52 G06F9/522

    摘要: A method, system and apparatus for barrier synchronization using distributed counters and a centralized sensor. The system can include multiple distributed counters coupled to corresponding application processes in a computing application. The barrier synchronization system further can include a centralized sensor coupled for observation by the application processes. Preferably, the application processes can be separate threads of execution in the computing application. The barrier synchronization centralized sensor yet further can be managed by a designated master one of the application processes. Moreover, preferably the system further can include a backup sensor coupled for observation by the application processes and managed by the designated master one of the application processes.

    摘要翻译: 一种使用分布式计数器和集中传感器进行屏障同步的方法,系统和装置。 该系统可以包括耦合到计算应用中的相应应用进程的多个分布式计数器。 屏障同步系统还可以包括一个集中式传感器,用于通过应用程序进行观察。 优选地,应用进程可以是计算应用中的单独的执行线程。 屏障同步集中式传感器还可以由指定的主控应用程序进行管理。 此外,优选地,系统还可以包括备用传感器,其被耦合用于由应用程序进行观察并由指定的主设备应用程序进行管理。

    Method and system for auto parallelization of zero-trip loops through induction variable substitution
    30.
    发明授权
    Method and system for auto parallelization of zero-trip loops through induction variable substitution 失效
    通过感应变量替代自动并联零跳闸回路的方法和系统

    公开(公告)号:US07487497B2

    公开(公告)日:2009-02-03

    申请号:US10926594

    申请日:2004-08-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443 G06F8/452

    摘要: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

    摘要翻译: 提供了通过利用并行化编译器代替嵌套的基本线性感应变量的零跳行循环自动并行化的方法和系统。 提供了对于从1到N迭代的典型循环,在没有关于N的值的信息的情况下,使用max {0,N}变量进行循环迭代,其中N是循环不变量。 对于嵌套的基本感应变量,将诱导变量替换过程应用于从最内循环到最外层循环的嵌套循环。 然后,通过IBM编译器的复制传播传递,随后删除最大运算符。 在这样做时,消除了对感应变量的循环依赖性,并且提供并行化编译器并行最外层循环的机会。