Optimized scalar promotion with load and splat SIMD instructions
    12.
    发明授权
    Optimized scalar promotion with load and splat SIMD instructions 失效
    通过加载和拼接SIMD指令优化标量升级

    公开(公告)号:US08572586B2

    公开(公告)日:2013-10-29

    申请号:US13555435

    申请日:2012-07-23

    IPC分类号: G06F9/30

    CPC分类号: G06F8/45

    摘要: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

    摘要翻译: 提供了在单指令多数据(SIMD)引擎上执行的优化标量代码的机制。 可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。 可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。 可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。 可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。 可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

    Optimized scalar promotion with load and splat SIMD instructions
    13.
    发明授权
    Optimized scalar promotion with load and splat SIMD instructions 失效
    通过加载和拼接SIMD指令优化标量升级

    公开(公告)号:US08255884B2

    公开(公告)日:2012-08-28

    申请号:US12134495

    申请日:2008-06-06

    IPC分类号: G06F9/45 G06F9/44

    CPC分类号: G06F8/45

    摘要: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

    摘要翻译: 提供了在单指令多数据(SIMD)引擎上执行的优化标量代码的机制。 可以基于原始代码表示中的标量和SIMD操作的标识来确定矢量操作 - 拼接操作的放置。 可以修改原始代码表示以基于所确定的向量操作 - 分组操作的放置来插入向量操作 - 拼接操作以生成第一修改代码表示。 可以基于第一修改代码表示中的标量和SIMD操作的标识来确定单独的拼接操作的放置。 可以修改第一修改代码表示以基于确定的单独splat操作的布局来插入或删除单独的splat操作以生成第二修改代码表示。 可以基于SIMD引擎执行的第二修改代码表示来输出SIMD代码。

    Shared Prefetching to Reduce Execution Skew in Multi-Threaded Systems
    15.
    发明申请
    Shared Prefetching to Reduce Execution Skew in Multi-Threaded Systems 失效
    共享预取以减少多线程系统中的执行偏差

    公开(公告)号:US20110276786A1

    公开(公告)日:2011-11-10

    申请号:US12773454

    申请日:2010-05-04

    IPC分类号: G06F9/30 G06F12/08 G06F12/00

    摘要: Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.

    摘要翻译: 提供了用于优化代码以执行数据预取到由计算设备上执行的多个线程共享的计算设备的共享存储器的机制。 识别由多个线程共享的代码的一部分的存储器流。 一组预取指令分布在多个线程上。 预取指令被插入到多个线程的指令序列中,使得每个指令序列具有预取指令集合的单独的子部分,从而生成优化的代码。 可执行代码基于优化的代码生成并存储在存储设备中。 执行的可执行代码在多个线程中以共享的方式执行与分布式预取指令集相关联的预取。

    Shared prefetching to reduce execution skew in multi-threaded systems
    17.
    发明授权
    Shared prefetching to reduce execution skew in multi-threaded systems 失效
    共享预取以减少多线程系统中的执行偏斜

    公开(公告)号:US08490071B2

    公开(公告)日:2013-07-16

    申请号:US12773454

    申请日:2010-05-04

    IPC分类号: G06F9/45

    摘要: Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.

    摘要翻译: 提供了用于优化代码以执行数据预取到由计算设备上执行的多个线程共享的计算设备的共享存储器的机制。 识别由多个线程共享的代码的一部分的存储器流。 一组预取指令分布在多个线程上。 预取指令被插入到多个线程的指令序列中,使得每个指令序列具有预取指令集合的单独的子部分,从而生成优化的代码。 可执行代码基于优化的代码生成并存储在存储设备中。 执行的可执行代码在多个线程中以共享的方式执行与分布式预取指令集相关联的预取。

    Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems
    20.
    发明申请
    Performance Evaluation of Algorithmic Tasks and Dynamic Parameterization on Multi-Core Processing Systems 审中-公开
    多核处理系统中算法任务和动态参数化的性能评估

    公开(公告)号:US20090144745A1

    公开(公告)日:2009-06-04

    申请号:US11947185

    申请日:2007-11-29

    IPC分类号: G06F9/50 G06F9/44

    摘要: Apparatus for evaluating the performance of DMA-based algorithmic tasks on a target multi-core processing system includes a memory and at least one processor coupled to the memory. The processor is operative: to input a template for a specified task, the template including DMA-related parameters specifying DMA operations and computational operations to be performed; to evaluate performance for the specified task by running a benchmark on the target multi-core processing system, the benchmark being operative to generate data access patterns using DMA operations and invoking prescribed computation routines as specified by the input template; and to provide results of the benchmark indicative of a measure of performance of the specified task corresponding to the target multi-core processing system.

    摘要翻译: 用于评估目标多核处理系统上基于DMA的算法任务的性能的装置包括存储器和耦合到存储器的至少一个处理器。 处理器是可操作的:输入指定任务的模板,该模板包括指定DMA操作的DMA相关参数和要执行的计算操作; 通过在目标多核处理系统上运行基准测试来评估指定任务的性能,该基准测试用于使用DMA操作生成数据访问模式,并调用由输入模板指定的规定的计算例程; 并提供表示与目标多核处理系统相对应的指定任务的性能度量的基准测试结果。