METHODS AND APPARATUS TO ELIMINATE PARTIAL-REDUNDANT VECTOR LOADS
    2.
    发明申请
    METHODS AND APPARATUS TO ELIMINATE PARTIAL-REDUNDANT VECTOR LOADS 有权
    消除部分冗余矢量负载的方法和装置

    公开(公告)号:US20160259628A1

    公开(公告)日:2016-09-08

    申请号:US14741160

    申请日:2015-06-16

    CPC classification number: G06F8/30 G06F8/4441 G06F8/452 G06F11/3688

    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.

    Abstract translation: 公开了方法,装置,系统和制品以消除部分冗余矢量载荷。 示例性装置包括基于向量操作的负载类型将向量操作与节点组相关联的节点组。 示例性装置还包括用于标识节点组中的候选的候选标识符,候选者包括节点组的向量操作的子集。 示例性装置还包括代码优化器,用于基于候选者的特征来确定替换代码,并且将与执行替换代码相关联的估计成本与执行候选的成本相比较的阈值成本进行比较。 示例性装置还包括代码生成器,当执行替换代码的估计成本满足阈值成本时,使用替换代码生成机器代码。

    COMPILER TRANSFORMATION WITH LOOP AND DATA PARTITIONING

    公开(公告)号:US20190042221A1

    公开(公告)日:2019-02-07

    申请号:US15972345

    申请日:2018-05-07

    Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.

    METHODS AND APPARATUS TO PERFORM AUTOMATIC COMPILER OPTIMIZATION TO ENABLE STREAMING-STORE GENERATION FOR UNALIGNED CONTIGUOUS WRITE ACCESS

    公开(公告)号:US20220012028A1

    公开(公告)日:2022-01-13

    申请号:US17483459

    申请日:2021-09-23

    Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.

    Methods and apparatus to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access

    公开(公告)号:US12032934B2

    公开(公告)日:2024-07-09

    申请号:US17483459

    申请日:2021-09-23

    CPC classification number: G06F8/4434

    Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.

Patent Agency Ranking