Sparse vectorization without hardware gather/scatter
    1.
    发明授权
    Sparse vectorization without hardware gather/scatter 失效
    稀疏矢量化无硬件收集/散射

    公开(公告)号:US08191056B2

    公开(公告)日:2012-05-29

    申请号:US11549172

    申请日:2006-10-13

    IPC分类号: G06F9/45

    CPC分类号: G06F8/447

    摘要: A target operation in a normalized target loop, susceptible of vectorization and which may, after compilation into a vectorized form, seek to operate on data in nonconsecutive physical memory, is identified in source code. Hardware instructions are inserted into executable code generated from the source code, directing a system that will run the executable code to create a representation of the data in consecutive physical memory. A vector loop containing the target operation is replaced, in the executable code, with a function call to a vector library to call a vector function that will operate on the representation to generate a result identical to output expected from executing the vector loop containing the target operation. On execution, a representation of data residing in nonconsecutive physical memory is created in consecutive physical memory, and the vectorized target operation is applied to the representation to process the data.

    摘要翻译: 标准化目标循环中的目标操作,易于向量化,并且可以在编译成向量化形式之后寻求对非连续物理存储器中的数据进行操作,在源代码中被识别。 硬件指令被插入到从源代码生成的可执行代码中,指示将运行可执行代码的系统在连续的物理内存中创建数据的表示。 包含目标操作的向量循环在可执行代码中被替换为对向量库的函数调用,以调用将在表示上操作的向量函数,以生成与执行包含目标的向量循环所期望的输出相同的结果 操作。 在执行时,在连续物理存储器中创建驻留在非连续物理存储器中的数据的表示,并且向量化的目标操作被应用于表示以处理数据。

    Compiling source code
    2.
    发明授权
    Compiling source code 失效
    编译源代码

    公开(公告)号:US08161464B2

    公开(公告)日:2012-04-17

    申请号:US11402556

    申请日:2006-04-11

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: A method of compiling source code. The method includes converting pointer-based access in the source code to array-based access in the source code in a first pass compilation of the source code. Information is collected for objects in the source code during the first pass compilation. Candidate objects in the source code are selected based on the collected information to form selected candidate objects. Global stride variables are created for the selected candidate objects. Memory allocation operations are updated for the selected candidate objects in a second pass compilation of the source code. Multiple-level pointer indirect references are replaced in the source code with multi-dimensional array indexed references for the selected candidate objects in the second pass compilation of the source code.

    摘要翻译: 一种编译源代码的方法。 该方法包括在源代码的第一遍编译中将源代码中的基于指针的访问转换为源代码中的基于数组的访问。 在第一次编译期间,为源代码中的对象收集信息。 基于收集的信息来选择源代码中的候选对象以形成所选择的候选对象。 为所选候选对象创建全局步幅变量。 在源代码的第二遍编译中,针对所选候选对象更新内存分配操作。 在源代码的第二遍编译中,多级指针间接引用被替换为所选候选对象的多维数组索引引用的源代码。

    Method and system for reducing memory reference overhead associated with treadprivate variables in parallel programs
    3.
    发明授权
    Method and system for reducing memory reference overhead associated with treadprivate variables in parallel programs 有权
    并行程序中减少与独立变量相关联的内存引用开销的方法和系统

    公开(公告)号:US07818731B2

    公开(公告)日:2010-10-19

    申请号:US12129449

    申请日:2008-05-29

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445 G06F8/443 G06F8/453

    摘要: A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.

    摘要翻译: 一种计算机实现的方法,系统和计算机程序产品,用于在程序编译期间在并行程序中访问线程私有变量的线程私有存储器。 在程序编译过程中,一种用于在并行程序中访问线程私有变量的计算机实现方法包括在程序中聚合线程私有变量,通过间接引用替代线程私有变量的引用,移动线程私有变量的地址加载操作,以及替换 threadprivate变量通过调用运行时程序访问线程私有内存。 本发明使得编译器能够最小化运行时程序调用时间以访问线程私有变量,从而提高程序性能。

    METHOD AND SYSTEM FOR REDUCING MEMORY REFERENCE OVERHEAD ASSOCIATED WITH TREADPRIVATE VARIABLES IN PARALLEL PROGRAMS
    5.
    发明申请
    METHOD AND SYSTEM FOR REDUCING MEMORY REFERENCE OVERHEAD ASSOCIATED WITH TREADPRIVATE VARIABLES IN PARALLEL PROGRAMS 有权
    用于减少与并行计划中的三位一体变量相关的内存引用的方法和系统

    公开(公告)号:US20080229297A1

    公开(公告)日:2008-09-18

    申请号:US12129449

    申请日:2008-05-29

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445 G06F8/443 G06F8/453

    摘要: A computer implemented method, system and computer program product for accessing threadprivate memory for threadprivate variables in a parallel program during program compilation. A computer implemented method for accessing threadprivate variables in a parallel program during program compilation includes aggregating threadprivate variables in the program, replacing references of the threadprivate variables by indirect references, moving address load operations of the threadprivate variables, and replacing the address load operations of the threadprivate variables by calls to runtime routines to access the threadprivate memory. The invention enables a compiler to minimize the runtime routines call times to access the threadprivate variables, thus improving program performance.

    摘要翻译: 一种计算机实现的方法,系统和计算机程序产品,用于在程序编译期间在并行程序中访问线程私有变量的线程私有存储器。 在程序编译过程中,一种用于在并行程序中访问线程私有变量的计算机实现方法包括在程序中聚合线程私有变量,通过间接引用替代线程私有变量的引用,移动线程私有变量的地址加载操作,以及替换 threadprivate变量通过调用运行时程序访问线程私有内存。 本发明使得编译器能够最小化运行时程序调用时间以访问线程私有变量,从而提高程序性能。

    Optimizing source code for iterative execution
    6.
    发明授权
    Optimizing source code for iterative execution 有权
    优化源代码进行迭代执行

    公开(公告)号:US07340733B2

    公开(公告)日:2008-03-04

    申请号:US10314094

    申请日:2002-12-05

    IPC分类号: G06F9/44 G06F9/45

    CPC分类号: G06F8/4441

    摘要: An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.

    摘要翻译: 本发明的一个实施例提供一种用于优化源代码以产生优化的源代码的优化器,其具有用于指示中央处理单元(CPU)迭代地计算初级重复元件的值的指令。 用于计算主要递归元素和随后的递归元素的计算机编程回路是涉及迭代地计算主要复发元素的情况的示例。 CPU可操作地耦合到快速操作存储器(FOM)并且可操作地耦合到慢速操作存储器(SOM)。 SOM存储生成的优化源代码。 优化的源代码包括用于指示所述CPU将计算的主循环元素的值存储在FOM的存储位置中的指令。 指令还包括将计算的主循环元素的值从存储位置委托给FOM的另一个存储位置的指令。

    Loop allocation for optimizing compilers
    7.
    发明授权
    Loop allocation for optimizing compilers 失效
    循环分配优化编译器

    公开(公告)号:US06651246B1

    公开(公告)日:2003-11-18

    申请号:US09574408

    申请日:2000-05-18

    IPC分类号: G06F945

    CPC分类号: G06F8/443

    摘要: Loop allocation for optimizing compilers includes the generation of a program dependence graph for a source code segment. Control dependence graph representations of the nested loops, from innermost to outermost, are generated and data dependence graph representations are generated for each level of nested loop as constrained by the control dependence graph. An interference graph is generated with the nodes of the data dependence graph. Weights are generated for the edges of the interference graph reflecting the affinity between statements represented by the nodes joined by the edges. Nodes in the interference graph are given weights reflecting resource usage by the statements associated with the nodes. The interference graph is partitioned using a profitability test based on the weights of edges and nodes and on a correctness test based on the reachability of nodes in the data dependence graph. Code is emitted based on the partitioned interference graph.

    摘要翻译: 用于优化编译器的循环分配包括生成源代码段的程序依赖图。 生成从最内到最外层的嵌套循环的控制依赖图表示,并且由控制依赖图约束的每个嵌套循环级生成数据依赖图表示。 使用数据依赖图的节点生成干涉图。 为干涉图的边缘生成反映由边缘连接的节点表示的语句之间的亲和度的权重。 干扰图中的节点被赋予反映与节点相关联的语句的资源使用权重。 使用基于边缘和节点的权重的利润率测试以及基于数据依赖图中的节点的可达性的正确性测试对干扰图进行分区。 基于分区干扰图发出代码。

    Optimal cache replacement scheme using a training operation
    8.
    发明授权
    Optimal cache replacement scheme using a training operation 失效
    使用训练操作的最优缓存替换方案

    公开(公告)号:US08352684B2

    公开(公告)日:2013-01-08

    申请号:US12236188

    申请日:2008-09-23

    IPC分类号: G06F12/00

    CPC分类号: G06F12/123 G06F2212/502

    摘要: Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.

    摘要翻译: 计算机实现方法,系统和计算机可用程序代码进行缓存管理。 提供了缓存,其中高速缓存被视为数据元素的排序数组,其中阵列的顶部位置是阵列的最近使用的位置,并且阵列的底部位置是阵列的最近最近使用的位置 。 提供存储器访问序列,并且针对存储器访问序列的存储器访问执行训练操作,以确定要针对存储器访问执行的存储器访问操作的类型。 响应于训练操作的结果,使用关于存储器访问的确定的存储器访问操作来执行高速缓存替换操作。

    Procedure control descriptor-based code specialization for context sensitive memory disambiguation
    9.
    发明授权
    Procedure control descriptor-based code specialization for context sensitive memory disambiguation 有权
    过程控制描述符代码专用于上下文敏感内存消歧

    公开(公告)号:US08332833B2

    公开(公告)日:2012-12-11

    申请号:US11757941

    申请日:2007-06-04

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4441

    摘要: A computer implemented method for facilitating debugging of source code. The source code is scanned to identify a candidate region. A procedure control descriptor is generated, wherein the procedure control descriptor corresponds to the candidate region. The procedure control descriptor identifies, for the candidate region, a condition which, if true at runtime means that the candidate region can be specialized. Responsive to a determination during compile time that satisfaction of at least one condition will be known only at runtime, the procedure control descriptor is used to specialize the candidate region at compile time to create a first version of the candidate region for execution in a case where the condition is true and a second version of the candidate region for execution in a case where the condition is false, and further generate code to correctly select one of the first region and the second region at runtime.

    摘要翻译: 一种用于促进源代码调试的计算机实现方法。 扫描源代码以识别候选区域。 生成过程控制描述符,其中过程控制描述符对应于候选区域。 程序控制描述符为候选区域识别条件,其在运行时为真,意味着候选区域可以是专门的。 在编译期间响应于在运行时仅满足至少一个条件的确定,过程控制描述符用于在编译时专门化候选区域,以在第一版本的候选区域中创建用于执行的候选区域, 条件为真,并且在条件为假的情况下用于执行的候选区域的第二版本,并且还在生成期间生成正确选择第一区域和第二区域中的一个的代码。

    Optimal Cache Management Scheme
    10.
    发明申请
    Optimal Cache Management Scheme 失效
    最优缓存管理方案

    公开(公告)号:US20100077153A1

    公开(公告)日:2010-03-25

    申请号:US12236188

    申请日:2008-09-23

    IPC分类号: G06F12/08

    CPC分类号: G06F12/123 G06F2212/502

    摘要: Computer implemented method, system and computer usable program code for cache management. A cache is provided, wherein the cache is viewed as a sorted array of data elements, wherein a top position of the array is a most recently used position of the array and a bottom position of the array is a least recently used position of the array. A memory access sequence is provided, and a training operation is performed with respect to a memory access of the memory access sequence to determine a type of memory access operation to be performed with respect to the memory access. Responsive to a result of the training operation, a cache replacement operation is performed using the determined memory access operation with respect to the memory access.

    摘要翻译: 计算机实现方法,系统和计算机可用程序代码进行缓存管理。 提供了缓存,其中高速缓存被视为数据元素的排序数组,其中阵列的顶部位置是阵列的最近使用的位置,并且阵列的底部位置是阵列的最近最近使用的位置 。 提供存储器访问序列,并且针对存储器访问序列的存储器访问执行训练操作,以确定要针对存储器访问执行的存储器访问操作的类型。 响应于训练操作的结果,使用关于存储器访问的确定的存储器访问操作来执行高速缓存替换操作。