METHOD USING SLP PACKING WITH STATEMENTS HAVING BOTH ISOMORPHIC AND NON-ISOMORPHIC EXPRESSIONS
    42.
    发明申请
    METHOD USING SLP PACKING WITH STATEMENTS HAVING BOTH ISOMORPHIC AND NON-ISOMORPHIC EXPRESSIONS 失效
    使用具有两个异构体和非同型异体表达的语句的SLP包装的方法

    公开(公告)号:US20090171919A1

    公开(公告)日:2009-07-02

    申请号:US11964324

    申请日:2007-12-26

    IPC分类号: G06F17/30

    CPC分类号: G06F8/456

    摘要: A computer implemented method is provided for using SLP in processing a plurality of statements, wherein the statements are associated with an array having a number of array positions, and each statement includes one or more expressions. The method includes the step of gathering expressions for each of the statements into a structure comprising a single merge stream, the merge streams being furnished with a location for each expression, wherein the location for a given expression is associated with one of the array positions. The method further comprises selectively identifying a plurality of expressions, and applying SLP packing operations to the identified expressions, in order to merge respective identified expressions into one or more isomorphic sub-streams. The method further comprises selectively combining the expressions of the isomorphic sub-streams, and other expressions of the single merge stream, into a number of input vectors that are substantially equal in length to one another. A location vector is generated that contains the respective locations for all of the expressions in the single merge stream. The method further comprises generating an output stream that comprises the expressions of the input vectors, wherein the expressions are arranged in the output stream an order determined by the respective locations contained in the location vector.

    摘要翻译: 提供了一种使用SLP处理多个语句的计算机实现的方法,其中所述语句与具有多个数组位置的数组相关联,并且每个语句包括一个或多个表达式。 该方法包括以下步骤:将每个语句的表达式收集到包括单个合并流的结构中,合并流被提供有每个表达式的位置,其中给定表达式的位置与阵列位置之一相关联。 该方法还包括选择性地识别多个表达,以及将SLP打包操作应用于所识别的表达,以便将各自识别的表达合并成一个或多个同构子流。 该方法还包括将同构子流的表达式和单个合并流的其他表达式选择性地组合成彼此长度上基本相等的多个输入向量。 生成位置向量,其包含单个合并流中所有表达式的相应位置。 所述方法还包括生成包括所述输入向量的表达式的输出流,其中所述表达式在所述输出流中被布置在由所述位置向量中包含的各个位置确定的顺序。

    System and Method for Advanced Polyhedral Loop Transformations of Source Code in a Compiler
    43.
    发明申请
    System and Method for Advanced Polyhedral Loop Transformations of Source Code in a Compiler 失效
    编译器中源代码的高级多面体循环变换的系统和方法

    公开(公告)号:US20090083724A1

    公开(公告)日:2009-03-26

    申请号:US11861449

    申请日:2007-09-26

    IPC分类号: G06F9/45

    CPC分类号: G06F8/447

    摘要: A system and method for advanced polyhedral loop transformations of source code in a compiler are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

    摘要翻译: 提供了一种用于编译器中源代码的高级多面体循环变换的系统和方法。 说明性实施例的机制通过提供用于在通过源代码的多面体环转换优化生成的中间表示中对各个语句实例执行代码生成变换的机制来解决已知的基于多面体循环变换的方法的弱点。 这些代码生成转换具有重要的属性,它们不改变中间表示中的语句的程序顺序。 该属性允许通过示例性实施例的新的重新导入路径将代码生成转换的结果提供给程序语句视图中的多面体循环变换机制,用于附加优化。

    Method to efficiently prefetch and batch compiler-assisted software cache accesses
    44.
    发明授权
    Method to efficiently prefetch and batch compiler-assisted software cache accesses 失效
    有效预取和批量编译器辅助软件缓存访问的方法

    公开(公告)号:US07493452B2

    公开(公告)日:2009-02-17

    申请号:US11465522

    申请日:2006-08-18

    IPC分类号: G06F12/00

    摘要: A method to efficiently pre-fetch and batch compiler-assisted software cache accesses is provided. The method reduces the overhead associated with software cache directory accesses. With the method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.

    摘要翻译: 提供了一种有效预取和批量编译器辅助的软件高速缓存访​​问的方法。 该方法减少与软件缓存目录访问相关的开销。 使用该方法,存储预取数据的高速缓存行的本地存储器地址本身被缓存,例如在本地存储器中的寄存器或公知位置中,使得稍后的数据访问不需要执行地址转换, 软件缓存操作,可以使用缓存的本地内存地址直接从软件缓存访问数据。 这节省了当使用数据时第二次执行地址转换所需的处理器周期。 此外,系统和方法直接使得软件高速缓存访​​问能够有效地从地址转换中解耦,以增加计算和通信之间的重叠。

    Code generation for complex arithmetic reduction for architectures lacking cross data-path support
    45.
    发明申请
    Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
    针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成

    公开(公告)号:US20080092124A1

    公开(公告)日:2008-04-17

    申请号:US11548851

    申请日:2006-10-12

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445 G06F8/45

    摘要: A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

    摘要翻译: 一种计算机实现的方法,装置和计算机可用程序代码,用于编译用于执行复杂操作的复杂缩减操作的源代码。 确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。 生成用于计算子产品的可执行代码,将子产品减少到中间结果,并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

    Building approximate data dependences with a moving window
    48.
    发明授权
    Building approximate data dependences with a moving window 失效
    使用移动窗口构建近似数据依赖关系

    公开(公告)号:US08667260B2

    公开(公告)日:2014-03-04

    申请号:US12717985

    申请日:2010-03-05

    IPC分类号: G06F9/44

    CPC分类号: G06F9/32

    摘要: Mechanisms for building approximate data dependences using a moving look-back window are provided. The mechanisms track dependence information for memory accesses over iterations of execution of a portion of code. The mechanisms receive a memory access of an iteration of the portion of code, the memory access having an address for access the memory and an access type indicating at least one of a read or a write access type. An entry in a moving look-back window data structure is generated corresponding to a memory location accessed by the memory access. The entry comprises at least an identification of the address, the access type, and an iteration number corresponding to the iteration of the memory access. The moving look-back window data structure is utilized to determine dependence information for memory accesses over a plurality of iterations of the portion of code.

    摘要翻译: 提供了使用移动后视窗构建近似数据依赖关系的机制。 机制跟踪代码的一部分执行迭代的存储器访问的依赖信息。 机构接收代码部分的迭代的存储器访问,存储器访问具有用于访问存储器的地址和指示读取或写入访问类型中的至少一个的访问类型。 对应于由存储器访问访问的存储器位置产生移动后视窗数据结构中的条目。 该条目至少包括对应于存储器访问的迭代的地址的标识,访问类型和迭代号。 移动后视窗数据结构用于确定代码部分的多个迭代中的存储器访问的依赖信息。

    Framework for generating mixed-mode operations in loop-level simdization
    50.
    发明授权
    Framework for generating mixed-mode operations in loop-level simdization 有权
    在循环级simdization中生成混合模式操作的框架

    公开(公告)号:US08549501B2

    公开(公告)日:2013-10-01

    申请号:US10919005

    申请日:2004-08-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452

    摘要: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.

    摘要翻译: 公开了在具有向量或SIMD处理单元的处理器的程序代码的编译中产生混合模式操作。 在本发明的优选实施例中,构成循环体的程序指令被抽象为虚拟向量指令。 对于初始代码优化目的,将这些虚拟向量指令作为向量指令(即向量单元写入的指令)进行处理。 虚拟向量指令最终被扩展为目标处理器的本地代码,此时,对于每个虚拟向量指令,确定是否将虚拟向量指令扩展为本地向量指令,进入本地标量指令,调用到前一个 定义的库函数,或这些的组合。 使用成本模型来确定基于硬件/软件约束,性能成本/效益和其他标准的最佳扩展选择。