Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling
    1.
    发明授权
    Single instruction multiple data (SIMD) code generation for parallel loops using versioning and scheduling 失效
    使用版本控制和调度的并行循环的单指令多数据(SIMD)代码生成

    公开(公告)号:US08341615B2

    公开(公告)日:2012-12-25

    申请号:US12172199

    申请日:2008-07-11

    IPC分类号: G06F9/45 G06F9/46

    CPC分类号: G06F8/456

    摘要: Embodiments of the present invention address deficiencies of the art in respect to loop parallelization for a target architecture implementing a shared memory model and provide a novel and non-obvious method, system and computer program product for SIMD code generation for parallel loops using versioning and scheduling. In an embodiment of the invention, within a code compilation data processing system a parallel SIMD loop code generation method can include identifying a loop in a representation of source code as a parallel loop candidate, either through a user directive or through auto-parallelization. The method also can include selecting a trip count condition responsive to a scheduling policy set for the code compilation data processing system and also on a minimal simdizable threshold, determining a trip count and an alignment constraint for the selected loop, and generating a version of a parallel loop in the source code according to the alignment constraint and a comparison of the trip count to the trip count condition.

    摘要翻译: 本发明的实施例解决了实现共享存储器模型的目标架构的环路并行化方面的技术缺陷,并且提供了一种用于使用版本控制和调度的并行循环的SIMD代码生成的新颖且非显而易见的方法,系统和计算机程序产品 。 在本发明的一个实施例中,在代码编译数据处理系统中,并行SIMD循环码生成方法可以包括通过用户指令或通过自动并行化来将源代码表示中的循环识别为并行循环候选。 该方法还可以包括响应于针对代码编译数据处理系统的调度策略集以及最小可仿真阈值来选择跳闸计数条件,确定所选循环的跳闸计数和对准约束,以及生成 根据对齐约束在源代码中并行循环,并将行程计数与行程计数条件进行比较。

    VIRTUAL MEMORY PROTOCOL SEGMENTATION OFFLOADING
    2.
    发明申请
    VIRTUAL MEMORY PROTOCOL SEGMENTATION OFFLOADING 有权
    虚拟内存协议分段卸载

    公开(公告)号:US20090304029A1

    公开(公告)日:2009-12-10

    申请号:US12254931

    申请日:2008-10-21

    IPC分类号: H04J3/24 G06F12/00

    摘要: Methods and systems for a more efficient transmission of network traffic are provided. According to one embodiment, a method is provided for performing segmentation offloading, such as TCP segmentation offloading (TSO). An interface performs direct virtual memory addressing of a user memory space of a system memory on behalf of a network processor to fetch payload data originated by a user process running on a host processor. Then, the network processor segments the payload data across one or more packets.

    摘要翻译: 提供了更有效地传输网络流量的方法和系统。 根据一个实施例,提供了一种用于执行诸如TCP分段卸载(TSO)的分段卸载的方法。 接口代表网络处理器执行对系统存储器的用户存储器空间的直接虚拟存储器寻址,以提取由主机处理器上运行的用户进程发起的有效载荷数据。 然后,网络处理器通过一个或多个分组分段有效载荷数据。

    Method and system for auto parallelization of zero-trip loops through induction variable substitution

    公开(公告)号:US20060048119A1

    公开(公告)日:2006-03-02

    申请号:US10926594

    申请日:2004-08-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443 G06F8/452

    摘要: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

    Compiler method of exploiting data value locality for computation reuse
    5.
    发明授权
    Compiler method of exploiting data value locality for computation reuse 有权
    利用数据值局部性进行计算重用的编译方法

    公开(公告)号:US09361078B2

    公开(公告)日:2016-06-07

    申请号:US11688090

    申请日:2007-03-19

    IPC分类号: G06F9/45 G06F9/38

    摘要: A compiler method for exploiting data value locality for computation reuse. When a code region having single entry and exit points and in which a potential computation reuse opportunity exists is identified during runtime, a helper thread is created separate from the master thread. One of the helper thread and master thread performs a computation specified in the code region, and the other of the helper thread and master thread looks up a value of the computation previously executed and stored in a lookup table. If the value of the computation previously executed is located in the lookup table, the other thread retrieves the value from the table, and ignores the computation performed by the thread. If the value of the computation is not located, the other thread obtains a result of the computation performed by the thread and stores the result in the lookup table for future computation reuse.

    摘要翻译: 一种用于利用数据值局部性进行计算重用的编译器方法。 当在运行时期间识别具有单个入口点和出口点并且存在潜在的计算重用机会的代码区域时,与主线程分开创建辅助线程。 辅助线程和主线程之一执行代码区域中指定的计算,辅助线程和主线程中的另一个查找先前执行并存储在查找表中的计算值。 如果先前执行的计算值位于查找表中,则另一个线程从表中检索该值,并忽略线程执行的计算。 如果计算值没有定位,则另一个线程获得由线程执行的计算结果,并将结果存储在查找表中以供将来的计算重用。

    Promotion of a child procedure in heterogeneous architecture software
    6.
    发明授权
    Promotion of a child procedure in heterogeneous architecture software 有权
    在异构架构软件中促进子程序

    公开(公告)号:US08527962B2

    公开(公告)日:2013-09-03

    申请号:US12400840

    申请日:2009-03-10

    IPC分类号: G06F9/45

    CPC分类号: G06F8/52

    摘要: A method for promotion of a child procedure in a software application for a heterogeneous architecture, wherein the heterogeneous architecture comprises a first architecture type and a second architecture type, comprises inserting a parameter representing a parallel frame pointer to a parent procedure of the child procedure into the child procedure; and modifying a reference in the child procedure to a stack variable of the parent procedure to include an indirect access to the parent procedure via the parallel frame pointer.

    摘要翻译: 一种用于在异构架构的软件应用程序中促进子程序的方法,其中异构架构包括第一架构类型和第二架构类型,包括将表示并行帧指针的参数插入到子程序的父过程中 子程序; 以及将子过程中的引用修改为父过程的堆栈变量,以通过并行帧指针包括对父过程的间接访问。

    SOFTWARE COMPILER GENERATED THREADED ENVIRONMENT
    7.
    发明申请
    SOFTWARE COMPILER GENERATED THREADED ENVIRONMENT 有权
    软件编译器生成的螺纹环境

    公开(公告)号:US20130061000A1

    公开(公告)日:2013-03-07

    申请号:US13223486

    申请日:2011-09-01

    IPC分类号: G06F9/30 G06F12/08

    摘要: A computer-implemented method for creating a threaded package of computer executable instructions from software compiler generated code includes allocating, through a computer processor, the computer executable instructions into a plurality of stacks, differentiating between different types of computer executable instructions for each computer executable instruction allocated to each stack of the plurality of stacks, creating switch points for each stack of the plurality of stacks based upon the differentiating, and inserting the switch points within each stack of the plurality of stacks.

    摘要翻译: 用于从软件编译器生成的代码创建计算机可执行指令的螺纹包的计算机实现的方法包括通过计算机处理器将计算机可执行指令分配到多个堆栈中,区分用于每个计算机可执行指令的不同类型的计算机可执行指令 分配给多个堆叠的每个堆叠,基于区分来为多个堆叠的每个堆叠创建切换点,并且将切换点插入到多个堆叠的每个堆栈内。

    Method and Apparatus for Emulating Stream Clock Signal in Asynchronous Data Transmission
    8.
    发明申请
    Method and Apparatus for Emulating Stream Clock Signal in Asynchronous Data Transmission 审中-公开
    用于在异步数据传输中仿真流时钟信号的方法和装置

    公开(公告)号:US20110311011A1

    公开(公告)日:2011-12-22

    申请号:US12819471

    申请日:2010-06-21

    IPC分类号: H04L7/00

    CPC分类号: H04J3/0632

    摘要: A method and apparatus for emulating stream clock signal in asynchronous data transmission. The inventive subject matter proposes a system consisting of a transmitter module, a receiver module, and a link or network in between. A scheme to generate the emulated stream clock across a wide frequency range is also proposed with the property of controllable deviation from the original stream frequency to meet jitter requirement and fast frequency convergence (minimal number of converging steps). The scheme includes an optional first step to derive a frequency estimation of the stream clock and a second step of continuous adjusting the emulated clock frequency to keep the average frequency equals that of the original stream clock.

    摘要翻译: 一种在异步数据传输中仿真流时钟信号的方法和装置。 本发明主题提出了一种由发射机模块,接收机模块以及其间的链路或网络组成的系统。 还提出了在宽频率范围内生成仿真流时钟的方案,其具有与原始流频率的可控偏差的特性,以满足抖动要求和快速频率收敛(最小收敛步数)。 该方案包括用于导出流时钟的频率估计的可选的第一步骤,以及连续调整仿真时钟频率以保持平均频率等于原始时钟的平均频率的第二步骤。

    NETWORK PROTOCOL REASSEMBLY ACCELARATION
    9.
    发明申请
    NETWORK PROTOCOL REASSEMBLY ACCELARATION 审中-公开
    网络协议重组增值

    公开(公告)号:US20090307363A1

    公开(公告)日:2009-12-10

    申请号:US12255916

    申请日:2008-10-22

    IPC分类号: G06F15/16

    摘要: Methods and systems are provided for network protocol reassembly acceleration. According to one embodiment, an incoming packet is received at a network interface. Payload data from the packet is written by a memory interface to a physical page within a system memory on behalf of the network interface based on a sequence number associated with the incoming packet and by obtaining a physical address from a virtual memory map corresponding to an incoming session with which the packet is associated. After the physical page is full, the physical page is made accessible to a user process being executed by a processor associated with the system memory by remapping the physical page through a paging table used by the user process.

    摘要翻译: 提供了网络协议重组加速的方法和系统。 根据一个实施例,在网络接口处接收输入分组。 来自分组的有效载荷数据由存储器接口代表网络接口写入系统存储器内的物理页面,该数据基于与输入分组相关联的序列号,并且通过从对应于传入的虚拟存储器映射获取物理地址 与数据包关联的会话。 在物理页面已满之后,通过由用户进程使用的寻呼表重新映射物理页面,通过与系统存储器相关联的处理器执行的用户进程使物理页面可访问。

    Software barrier synchronization
    10.
    发明授权
    Software barrier synchronization 失效
    软件障碍同步

    公开(公告)号:US07581222B2

    公开(公告)日:2009-08-25

    申请号:US10718293

    申请日:2003-11-20

    IPC分类号: G06F9/46

    CPC分类号: G06F9/52 G06F9/522

    摘要: The present invention provides an approach for barrier synchronization. The barrier has a first array of elements with each element of the first array having an associated process, and a second array of elements with each element of the second array having an associated process. Prior to use, the values or states of the elements in each array may be initialized. As each process finishes its phase and arrives at the barrier, it may update the value or state of its associated element in the first array. Each process may then proceed to spin at its associated element in the second array, waiting for that element to switch. When the values or states of the elements of the first array reach a predetermined value or state, an instruction is sent to all of the elements in the second array to switch their values or states, allowing all processes to leave.

    摘要翻译: 本发明提供了一种屏障同步的方法。 所述障碍物具有第一阵列的元素,所述第一阵列的每个元素具有相关联的进程,并且所述第二阵列的第二阵列具有所述第二阵列的每个元素具有相关联的进程。 在使用之前,可以初始化每个数组中元素的值或状态。 当每个进程完成其相位并到达屏障时,它可以更新第一个阵列中其关联元素的值或状态。 然后,每个进程可以继续在第二阵列中的相关联的元素旋转,等待该元素切换。 当第一阵列的元素的值或状态达到预定值或状态时,向第二阵列中的所有元素发送指令以切换其值或状态,允许所有进程离开。