Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support
    1.
    发明授权
    Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support 有权
    在存在多线程和其他虚假共享条件的情况下以及具有存储器保护支持的机器中有效地生成SIMD代码

    公开(公告)号:US07730463B2

    公开(公告)日:2010-06-01

    申请号:US11358372

    申请日:2006-02-21

    IPC分类号: G06F9/45

    CPC分类号: G06F9/3851 G06F8/44

    摘要: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.

    摘要翻译: 一种用于自动生成SIMD代码的计算机实现的方法,系统和计算机程序产品。 该方法开始于分析要由目标循环访问的数据,包括至少一个语句,其中每个语句具有至少一个存储器引用,以确定存储器访问是否安全。 如果存储器访问是安全的,则对象循环被简化。 如果不安全,则确定是否可以应用不需要保证安全性的方案。 如果可以应用这种方案,则根据该方案对目标循环进行模拟。 如果不能应用这种方案,则确定填充是否合适。 如果填充是合适的,则填充数据并对目标循环进行模拟。 如果填充不合适,则基于用于处理边界条件的目标循环生成非模拟代码,目标循环被简化并与非模拟代码组合。

    SIMD code generation for loops with mixed data lengths
    2.
    发明授权
    SIMD code generation for loops with mixed data lengths 失效
    具有混合数据长度的循环的SIMD代码生成

    公开(公告)号:US07475392B2

    公开(公告)日:2009-01-06

    申请号:US10919131

    申请日:2004-08-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452

    摘要: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

    摘要翻译: 公开了在单指令多数据路径(SIMD)架构中生成循环码,其循环对具有不同长度的数据类型进行操作。 此外,本发明的优选实施例包括一种用于有效地将任意流重新对准或将任意流移动到任意偏移的新技术,无论在编译时是否知道对准或偏移。 这种技术使得可以将高级对齐优化应用于运行时对齐。 用于打包和解包数据值的长度转换操作包含在对齐处理框架中。 这些操作根据在各种SIMD平台上容易获得的标准SIMD指令正式定义。 这允许对具有不同长度的数据类型的顺序循环代码通过完全自动化的过程进行转换(“模拟化”)成优化的SIMD代码。

    METHOD TO EXPLOIT SUPERWORD-LEVEL PARALLELISM USING SEMI-ISOMORPHIC PACKING
    3.
    发明申请
    METHOD TO EXPLOIT SUPERWORD-LEVEL PARALLELISM USING SEMI-ISOMORPHIC PACKING 失效
    使用半正交包装开发超级平行平行的方法

    公开(公告)号:US20080127144A1

    公开(公告)日:2008-05-29

    申请号:US11536990

    申请日:2006-09-29

    IPC分类号: G06F9/45

    CPC分类号: G06F8/456

    摘要: A computer program product is provided for extracting SIMD parallelism. The computer program product includes instructions for providing a stream of input code comprising basic blocks; identifying pairs of statements that are semi-isomorphic with respect to each other within a basic block; iteratively combining into packs, pairs of statements that are semi-isomorphic with respect to each other, and combining packs into combined packs; collecting packs whose statements can be scheduled together for processing; and generating SIMD instructions for each pack to provide for extracting the SIMD parallelism.

    摘要翻译: 提供了一种用于提取SIMD并行性的计算机程序产品。 计算机程序产品包括用于提供包括基本块的输入代码流的指令; 识别在基本块内相对于彼此半同构的语句对; 迭代地组合成包,相对于半同构的语句对,以及将包合并成组合包; 收集包,其陈述可以一起安排处理; 并为每个包生成SIMD指令以提供SIMD并行性。

    Analyze and reduce number of data reordering operations in SIMD code
    4.
    发明授权
    Analyze and reduce number of data reordering operations in SIMD code 有权
    分析和减少SIMD代码中数据重排序的数量

    公开(公告)号:US08954943B2

    公开(公告)日:2015-02-10

    申请号:US11340452

    申请日:2006-01-26

    IPC分类号: G06F9/45 G06F15/00 G06F15/76

    CPC分类号: G06F8/443

    摘要: A method for analyzing data reordering operations in Single Issue Multiple Data source code and generating executable code therefrom is provided. Input is received. One or more data reordering operations in the input are identified and each data reordering operation in the input is abstracted into a corresponding virtual shuffle operation so that each virtual shuffle operation forms part of an expression tree. One or more virtual shuffle trees are collapsed by combining virtual shuffle operations within at least one of the one or more virtual shuffle trees to form one or more combined virtual shuffle operations, wherein each virtual shuffle tree is a subtree of the expression tree that only contains virtual shuffle operations. Then code is generated for the one or more combined virtual shuffle operations.

    摘要翻译: 提供了一种用于分析单发多数据源代码中的数据重排序操作并从中生成可执行代码的方法。 收到输入。 识别输入中的一个或多个数据重排序操作,并将输入中的每个数据重排序操作抽象为相应的虚拟随机播放操作,使得每个虚拟随机播放操作形成表达式树的一部分。 通过将所述一个或多个虚拟随机播放树中的至少一个中的虚拟随机播放操作组合以形成一个或多个组合的虚拟随机播放操作来折叠一个或多个虚拟洗牌树,其中每个虚拟随机播放树是仅包含表达式树的子树 虚拟随机操作。 然后为一个或多个组合的虚拟随机操作生成代码。

    Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization
    5.
    发明授权
    Framework for integrated intra- and inter-loop aggregation of contiguous memory accesses for SIMD vectorization 失效
    用于SIMD向量化的连续存储器访问的集成的内部和组间集成的框架

    公开(公告)号:US08056069B2

    公开(公告)日:2011-11-08

    申请号:US11856284

    申请日:2007-09-17

    IPC分类号: G06F9/45 G06F7/52 G06F15/00

    CPC分类号: G06F8/4452 G06F8/445

    摘要: A method, computer program product, and information handling system for generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop contains multiple non-stride-one memory accesses that operate over a contiguous stream of memory is disclosed. A preferred embodiment identifies groups of isomorphic statements within a loop body where the isomorphic statements operate over a contiguous stream of memory over the iteration of the loop. Those identified statements are then converted into virtual-length vector operations. Next, the hardware's available vector length is used to determine a number of virtual-length vectors to aggregate into a single vector operation for each iteration of the loop. Finally, the aggregated, vectorized loop code is converted into SIMD operations.

    摘要翻译: 一种用于生成在单指令多数据路径(SIMD)架构上执行的循环码的方法,计算机程序产品和信息处理系统,其中循环包含在连续的存储器流上操作的多个非步进存储器访问 披露 优选实施例识别在循环体内同构语句的组,其中同构语句在循环的迭代上在连续的存储器流上操作。 然后将那些识别的语句转换为虚拟长度向量操作。 接下来,使用硬件的可用向量长度来确定多个虚拟长度向量以聚合到单个向量操作中,用于循环的每次迭代。 最后,聚合的向量化循环码被转换成SIMD操作。

    METHOD USING SLP PACKING WITH STATEMENTS HAVING BOTH ISOMORPHIC AND NON-ISOMORPHIC EXPRESSIONS
    6.
    发明申请
    METHOD USING SLP PACKING WITH STATEMENTS HAVING BOTH ISOMORPHIC AND NON-ISOMORPHIC EXPRESSIONS 失效
    使用具有两个异构体和非同型异体表达的语句的SLP包装的方法

    公开(公告)号:US20090171919A1

    公开(公告)日:2009-07-02

    申请号:US11964324

    申请日:2007-12-26

    IPC分类号: G06F17/30

    CPC分类号: G06F8/456

    摘要: A computer implemented method is provided for using SLP in processing a plurality of statements, wherein the statements are associated with an array having a number of array positions, and each statement includes one or more expressions. The method includes the step of gathering expressions for each of the statements into a structure comprising a single merge stream, the merge streams being furnished with a location for each expression, wherein the location for a given expression is associated with one of the array positions. The method further comprises selectively identifying a plurality of expressions, and applying SLP packing operations to the identified expressions, in order to merge respective identified expressions into one or more isomorphic sub-streams. The method further comprises selectively combining the expressions of the isomorphic sub-streams, and other expressions of the single merge stream, into a number of input vectors that are substantially equal in length to one another. A location vector is generated that contains the respective locations for all of the expressions in the single merge stream. The method further comprises generating an output stream that comprises the expressions of the input vectors, wherein the expressions are arranged in the output stream an order determined by the respective locations contained in the location vector.

    摘要翻译: 提供了一种使用SLP处理多个语句的计算机实现的方法,其中所述语句与具有多个数组位置的数组相关联,并且每个语句包括一个或多个表达式。 该方法包括以下步骤:将每个语句的表达式收集到包括单个合并流的结构中,合并流被提供有每个表达式的位置,其中给定表达式的位置与阵列位置之一相关联。 该方法还包括选择性地识别多个表达,以及将SLP打包操作应用于所识别的表达,以便将各自识别的表达合并成一个或多个同构子流。 该方法还包括将同构子流的表达式和单个合并流的其他表达式选择性地组合成彼此长度上基本相等的多个输入向量。 生成位置向量,其包含单个合并流中所有表达式的相应位置。 所述方法还包括生成包括所述输入向量的表达式的输出流,其中所述表达式在所述输出流中被布置在由所述位置向量中包含的各个位置确定的顺序。

    Method using SLP packing with statements having both isomorphic and non-isomorphic expressions
    7.
    发明授权
    Method using SLP packing with statements having both isomorphic and non-isomorphic expressions 失效
    使用具有同构和非同构表达式的语句的SLP打包的方法

    公开(公告)号:US08266587B2

    公开(公告)日:2012-09-11

    申请号:US11964324

    申请日:2007-12-26

    IPC分类号: G06F9/44

    CPC分类号: G06F8/456

    摘要: Disclosure for using SLP in processing a plurality of statements, wherein the statements are associated with an array having a number of array positions, and each statement includes one or more expressions. Expressions are gathered for each of the statements into a structure comprising a single merge stream furnished with a location for each expression. The location for a given expression is associated with one of the array positions. A plurality of expressions are selectively identified and SLP packing operations are applied to the identified expressions to merge into one or more isomorphic sub-streams. Expressions of the isomorphic sub-streams and other expressions of the single merge stream are combined into a number of input vectors that are substantially equal in length to one another. A location vector is generated that contains the respective locations for all of the expressions in the single merge stream.

    摘要翻译: 在处理多个语句中使用SLP的公开,其中所述语句与具有多个数组位置的数组相关联,并且每个语句包括一个或多个表达式。 将每个语句的表达式收集到包含单个合并流的结构中,每个合并流都包含每个表达式的位置。 给定表达式的位置与其中一个数组位置相关联。 选择性地识别多个表达,并且将SLP打包操作应用于所识别的表达,以合并到一个或多个同构子流中。 单个合并流的同构子流和其他表达式的表达式被组合成彼此长度上基本相等的多个输入向量。 生成位置向量,其包含单个合并流中所有表达式的相应位置。

    SIMD code generation in the presence of optimized misaligned data reorganization
    8.
    发明授权
    SIMD code generation in the presence of optimized misaligned data reorganization 失效
    存在优化的未对齐数据重组的SIMD代码生成

    公开(公告)号:US08196124B2

    公开(公告)日:2012-06-05

    申请号:US12196764

    申请日:2008-08-22

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452 G06F8/447

    摘要: Loop code is generated to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

    摘要翻译: 生成循环代码以在单指令多数据路径(SIMD)架构上执行,其中循环对具有不同长度的数据类型进行操作。 此外,本发明的优选实施例包括一种用于有效地将任意流重新对准或将任意流移动到任意偏移的新技术,无论在编译时是否已知对准或偏移。 这种技术使得可以将高级对齐优化应用于运行时对齐。 这允许对具有不同长度的数据类型的顺序循环代码通过完全自动化的过程进行转换(“模拟化”)成优化的SIMD代码。

    Efficient Code Generation Using Loop Peeling for SIMD Loop Code with Multiple Misaligned Statements
    9.
    发明申请
    Efficient Code Generation Using Loop Peeling for SIMD Loop Code with Multiple Misaligned Statements 失效
    使用循环剥离对具有多个不对齐语句的SIMD循环码进行有效的代码生成

    公开(公告)号:US20080222623A1

    公开(公告)日:2008-09-11

    申请号:US12122050

    申请日:2008-05-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/447 G06F8/4441

    摘要: An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.

    摘要翻译: 提供了一种方法,用于在仅支持对齐加载和存储的SIMD架构的编译代码中向量化未对齐的引用。 在这个框架中,循环首先被模拟,就好像内存单元没有对齐约束。 编译器然后插入数据重组操作以满足硬件的实际对齐要求。 最后,代码生成算法基于数据重组图生成SIMD代码,解决诸如运行时对齐,未知循环边界,残差迭代计数以及具有任意对齐组合的多个语句之类的现实问题。 循环剥离用于减少与未对齐数据相关的计算开销。 循环序言和结语在模拟循环中从单独迭代中去除,向量拼接指令被应用于剥离的迭代,而稳态循环体不引起额外的计算开销。

    Framework for generating mixed-mode operations in loop-level simdization
    10.
    发明授权
    Framework for generating mixed-mode operations in loop-level simdization 有权
    在循环级simdization中生成混合模式操作的框架

    公开(公告)号:US08549501B2

    公开(公告)日:2013-10-01

    申请号:US10919005

    申请日:2004-08-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4452

    摘要: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.

    摘要翻译: 公开了在具有向量或SIMD处理单元的处理器的程序代码的编译中产生混合模式操作。 在本发明的优选实施例中,构成循环体的程序指令被抽象为虚拟向量指令。 对于初始代码优化目的,将这些虚拟向量指令作为向量指令(即向量单元写入的指令)进行处理。 虚拟向量指令最终被扩展为目标处理器的本地代码,此时,对于每个虚拟向量指令,确定是否将虚拟向量指令扩展为本地向量指令,进入本地标量指令,调用到前一个 定义的库函数,或这些的组合。 使用成本模型来确定基于硬件/软件约束,性能成本/效益和其他标准的最佳扩展选择。