Code generation for complex arithmetic reduction for architectures lacking cross data-path support
    1.
    发明授权
    Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
    针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成

    公开(公告)号:US08423979B2

    公开(公告)日:2013-04-16

    申请号:US11548851

    申请日:2006-10-12

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445 G06F8/45

    摘要: A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

    摘要翻译: 一种计算机实现的方法,装置和计算机可用程序代码,用于编译用于执行复杂操作的复杂缩减操作的源代码。 确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。 生成用于计算子产品的可执行代码,将子产品减少到中间结果,并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

    Code generation for complex arithmetic reduction for architectures lacking cross data-path support
    2.
    发明申请
    Code generation for complex arithmetic reduction for architectures lacking cross data-path support 有权
    针对缺乏跨数据路径支持的架构的复杂算术减少的代码生成

    公开(公告)号:US20080092124A1

    公开(公告)日:2008-04-17

    申请号:US11548851

    申请日:2006-10-12

    IPC分类号: G06F9/45

    CPC分类号: G06F8/445 G06F8/45

    摘要: A computer implemented method, apparatus, and computer usable program code for compiling source code for performing a complex operation followed by a complex reduction operation. A method is determined for generating executable code for performing the complex operation and the complex reduction operation. Executable code is generated for computing sub-products, reducing the sub-products to intermediate results, and summing the intermediate results to generate a final result in response to a determination that a reduced single instruction multiple data method is appropriate.

    摘要翻译: 一种计算机实现的方法,装置和计算机可用程序代码,用于编译用于执行复杂操作的复杂缩减操作的源代码。 确定用于生成用于执行复杂操作和复合缩减操作的可执行代码的方法。 生成用于计算子产品的可执行代码,将子产品减少到中间结果,并且对中间结果求和以响应于减少的单指令多数据方法的确定而产生最终结果。

    Sparse vectorization without hardware gather / scatter
    3.
    发明申请
    Sparse vectorization without hardware gather / scatter 失效
    稀疏矢量化无硬件收集/散射

    公开(公告)号:US20080092125A1

    公开(公告)日:2008-04-17

    申请号:US11549172

    申请日:2006-10-13

    IPC分类号: G06F9/45

    CPC分类号: G06F8/447

    摘要: A target operation in a normalized target loop, susceptible of vectorization and which may, after compilation into a vectorized form, seek to operate on data in nonconsecutive physical memory, is identified in source code. Hardware instructions are inserted into executable code generated from the source code, directing a system that will run the executable code to create a representation of the data in consecutive physical memory. A vector loop containing the target operation is replaced, in the executable code, with a function call to a vector library to call a vector function that will operate on the representation to generate a result identical to output expected from executing the vector loop containing the target operation. On execution, a representation of data residing in nonconsecutive physical memory is created in consecutive physical memory, and the vectorized target operation is applied to the representation to process the data.

    摘要翻译: 标准化目标循环中的目标操作,易于向量化,并且可以在编译成向量化形式之后寻求对非连续物理存储器中的数据进行操作,在源代码中被识别。 硬件指令被插入到从源代码生成的可执行代码中,指示将运行可执行代码的系统在连续的物理内存中创建数据的表示。 包含目标操作的向量循环在可执行代码中被替换为对向量库的函数调用,以调用将在表示上操作的向量函数,以生成与执行包含目标的向量循环所期望的输出相同的结果 操作。 在执行时,在连续物理存储器中创建驻留在非连续物理存储器中的数据的表示,并且向量化的目标操作被应用于表示以处理数据。

    Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer
    4.
    发明授权
    Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer 失效
    通过使用插入缓存到缓存数据传输的重置指令来管理带宽

    公开(公告)号:US07168070B2

    公开(公告)日:2007-01-23

    申请号:US10853304

    申请日:2004-05-25

    IPC分类号: G06F9/45 G06F13/00

    摘要: A method and system for reducing or avoiding store misses with a data cache block zero (DCBZ) instruction in cooperation with the underlying hardware load stream prefetching support for helping to increase effective aggregate bandwith. The method identifies and classifies unique streams in a loop based on dependency and reuse analysis, and performs loop transformations, such as node splitting, loop distribution or stream unrolling to get the proper number of streams. Static prediction and run-time profile information are used to guide loop and stream selection. Compile-time loop cost analysis and run-time check code and versioning are used to determine the number of cache lines ahead of each reference for data cache line zeroing and to tolerate required data alignment relative to data cache lines.

    摘要翻译: 与底层硬件负载流预取支持协作,通过数据缓存块零(DCBZ)指令减少或避免存储错误的方法和系统,以帮助增加有效的聚合带宽。 该方法基于依赖和重用分析在循环中识别和分类唯一流,并执行循环转换,例如节点分割,循环分布或流展开以获得适当数量的流。 静态预测和运行时间轮廓信息用于指导循环和流选择。 编译时循环成本分析和运行时检查代码和版本控制用于确定数据高速缓存行归零的每个引用之前的高速缓存行数,并允许相对于数据高速缓存行的所需数据对齐。

    Method and apparatus for optimizing software program using inter-procedural strength reduction
    5.
    发明授权
    Method and apparatus for optimizing software program using inter-procedural strength reduction 失效
    使用程序间强度降低优化软件程序的方法和装置

    公开(公告)号:US08146070B2

    公开(公告)日:2012-03-27

    申请号:US12270707

    申请日:2008-11-13

    IPC分类号: G06F9/45

    CPC分类号: G06F8/443

    摘要: Inter-procedural strength reduction is provided by a mechanism of the present invention to optimize software program. During a forward pass, the present invention collects information of global variables and analyzes the information to select candidate computations for optimization. During a backward pass, the present invention replaces costly computations with less costly or weaker computations using pre-computed values and inserts store operations of new global variables to pre-compute the costly computations at definition points of the global variables used in the costly computations.

    摘要翻译: 通过本发明的机制来优化软件程序来提供程序间强度降低。 在正向通过期间,本发明收集全局变量的信息并分析该信息以选择用于优化的候选计算。 在反向传递期间,本发明使用预先计算的值替代使用成本较低或较弱计算的昂贵的计算,并插入新的全局变量的存储操作,以在昂贵的计算中使用的全局变量的定义点处预先计算昂贵的计算。

    METHOD OF PROCEDURE CONTROL DESCRIPTOR-BASED CODE SPECIALIZATION FOR CONTEXT SENSITIVE MEMORY DISAMBIGUATION
    8.
    发明申请
    METHOD OF PROCEDURE CONTROL DESCRIPTOR-BASED CODE SPECIALIZATION FOR CONTEXT SENSITIVE MEMORY DISAMBIGUATION 有权
    程序控制方法基于描述符的中继敏感记忆体解析专用化

    公开(公告)号:US20080301656A1

    公开(公告)日:2008-12-04

    申请号:US11757941

    申请日:2007-06-04

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4441

    摘要: A computer implemented method, apparatus, and computer program product for compiling source code. The source code is scanned to identify a candidate region. A procedure control descriptor is corresponding to the candidate region is generated. The procedure control descriptor identifies, for the candidate region, a condition which, if true at runtime means that the candidate region can be specialized. Responsive to a determination during compile time that satisfaction of at least one condition will be known only at runtime, the procedure control descriptor is used to specialize the candidate region at compile time to create a first version of the candidate region for execution in a case where the condition is true and a second version of the candidate region for execution in a case where the condition is false. Also responsive to the determination, code is further generated to correctly select one of the first region and the second region at runtime.

    摘要翻译: 用于编译源代码的计算机实现的方法,装置和计算机程序产品。 扫描源代码以识别候选区域。 程序控制描述符对应于生成候选区域。 程序控制描述符为候选区域识别条件,其在运行时为真,意味着候选区域可以是专门的。 在编译期间响应于在运行时仅满足至少一个条件的确定,过程控制描述符用于在编译时专门化候选区域,以在第一版本的候选区域中创建用于执行的候选区域, 条件是真实的,并且在条件为假的情况下用于执行的候选区域的第二版本。 还响应于确定,进一步生成代码以在运行时正确选择第一区域和第二区域中的一个。

    Sparse vectorization without hardware gather/scatter
    9.
    发明授权
    Sparse vectorization without hardware gather/scatter 失效
    稀疏矢量化无硬件收集/散射

    公开(公告)号:US08191056B2

    公开(公告)日:2012-05-29

    申请号:US11549172

    申请日:2006-10-13

    IPC分类号: G06F9/45

    CPC分类号: G06F8/447

    摘要: A target operation in a normalized target loop, susceptible of vectorization and which may, after compilation into a vectorized form, seek to operate on data in nonconsecutive physical memory, is identified in source code. Hardware instructions are inserted into executable code generated from the source code, directing a system that will run the executable code to create a representation of the data in consecutive physical memory. A vector loop containing the target operation is replaced, in the executable code, with a function call to a vector library to call a vector function that will operate on the representation to generate a result identical to output expected from executing the vector loop containing the target operation. On execution, a representation of data residing in nonconsecutive physical memory is created in consecutive physical memory, and the vectorized target operation is applied to the representation to process the data.

    摘要翻译: 标准化目标循环中的目标操作,易于向量化,并且可以在编译成向量化形式之后寻求对非连续物理存储器中的数据进行操作,在源代码中被识别。 硬件指令被插入到从源代码生成的可执行代码中,指示将运行可执行代码的系统在连续的物理内存中创建数据的表示。 包含目标操作的向量循环在可执行代码中被替换为对向量库的函数调用,以调用将在表示上操作的向量函数,以生成与执行包含目标的向量循环所期望的输出相同的结果 操作。 在执行时,在连续物理存储器中创建驻留在非连续物理存储器中的数据的表示,并且向量化的目标操作被应用于表示以处理数据。

    Compiling source code
    10.
    发明授权
    Compiling source code 失效
    编译源代码

    公开(公告)号:US08161464B2

    公开(公告)日:2012-04-17

    申请号:US11402556

    申请日:2006-04-11

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: A method of compiling source code. The method includes converting pointer-based access in the source code to array-based access in the source code in a first pass compilation of the source code. Information is collected for objects in the source code during the first pass compilation. Candidate objects in the source code are selected based on the collected information to form selected candidate objects. Global stride variables are created for the selected candidate objects. Memory allocation operations are updated for the selected candidate objects in a second pass compilation of the source code. Multiple-level pointer indirect references are replaced in the source code with multi-dimensional array indexed references for the selected candidate objects in the second pass compilation of the source code.

    摘要翻译: 一种编译源代码的方法。 该方法包括在源代码的第一遍编译中将源代码中的基于指针的访问转换为源代码中的基于数组的访问。 在第一次编译期间,为源代码中的对象收集信息。 基于收集的信息来选择源代码中的候选对象以形成所选择的候选对象。 为所选候选对象创建全局步幅变量。 在源代码的第二遍编译中,针对所选候选对象更新内存分配操作。 在源代码的第二遍编译中,多级指针间接引用被替换为所选候选对象的多维数组索引引用的源代码。