Methods and apparatus for automatic communication optimizations in a compiler based on a polyhedral representation

    公开(公告)号:US09830133B1

    公开(公告)日:2017-11-28

    申请号:US13712659

    申请日:2012-12-12

    IPC分类号: G06F9/45

    CPC分类号: G06F8/41 G06F8/453 G06F8/457

    摘要: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one local memory unit that allows for data reuse opportunities. The first custom computing apparatus optimizes the code for reduced communication execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

    Methods and apparatus for joint scheduling and layout optimization to enable multi-level vectorization
    4.
    发明授权
    Methods and apparatus for joint scheduling and layout optimization to enable multi-level vectorization 有权
    联合调度和布局优化的方法和装置,以实现多级向量化

    公开(公告)号:US09489180B1

    公开(公告)日:2016-11-08

    申请号:US13679861

    申请日:2012-11-16

    IPC分类号: G06F9/44 G06F9/45

    CPC分类号: G06F8/443 G06F8/447

    摘要: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least one vector execution unit that allow for parallel execution of tasks on constant-strided memory locations. The first custom computing apparatus optimizes the code for parallelism, locality of operations, constant-strided memory accesses and vectorized execution on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

    摘要翻译: 提供了用于源代码优化的方法,设备和计算机软件产品。 在示例性实施例中,使用第一定制计算装置来优化第二计算装置上的源代码的执行。 在该实施例中,第一定制计算装置包含存储器,存储介质和具有至少一个多级执行单元的至少一个处理器。 第二计算装置包含至少一个向量执行单元,其允许并行执行恒定跨度存储器位置上的任务。 第一定制计算装置优化用于并行性的代码,操作的局部性,在第二计算装置上的恒定帧存储器访问和向量化执行。 本摘要仅用于遵守抽象要求规则。 本摘要以明确的理解提交,不会用于解释或限制权利要求的范围或含义。

    SYSTEM AND METHOD FOR GENERATION OF EVENT DRIVEN, TUPLE-SPACE BASED PROGRAMS
    5.
    发明申请
    SYSTEM AND METHOD FOR GENERATION OF EVENT DRIVEN, TUPLE-SPACE BASED PROGRAMS 审中-公开
    用于生成事件驱动,基于空间的程序的系统和方法

    公开(公告)号:US20150089485A1

    公开(公告)日:2015-03-26

    申请号:US14492899

    申请日:2014-09-22

    IPC分类号: G06F9/45

    摘要: In a system for automatic generation of event-driven, tuple-space based programs from a sequential specification, a hierarchical mapping solution can target different runtimes relying on event-driven tasks (EDTs). The solution uses loop types to encode short, transitive relations among EDTs that can be evaluated efficiently at runtime. Specifically, permutable loops translate immediately into conservative point-to-point synchronizations of distance one. A runtime-agnostic which can be used to target the transformed code to different runtimes.

    摘要翻译: 在一个从顺序规范自动生成基于元组空间的程序的系统中,分层映射解决方案可以针对不依赖于事件驱动任务(EDT)的运行时间。 该解决方案使用循环类型来编码EDT之间的短期,传递关系,可以在运行时有效地评估。 具体来说,可置换循环立即转换为距离1的保守点对点同步。 与运行时无关的,可用于将转换后的代码定位到不同的运行时。

    Efficient and scalable computations with sparse tensors

    公开(公告)号:US10936569B1

    公开(公告)日:2021-03-02

    申请号:US13898159

    申请日:2013-05-20

    IPC分类号: G06F16/00 G06F16/22

    摘要: In a system for storing in memory a tensor that includes at least three modes, elements of the tensor are stored in a mode-based order for improving locality of references when the elements are accessed during an operation on the tensor. To facilitate efficient data reuse in a tensor transform that includes several iterations, on a tensor that includes at least three modes, a system performs a first iteration that includes a first operation on the tensor to obtain a first intermediate result, and the first intermediate result includes a first intermediate-tensor. The first intermediate result is stored in memory, and a second iteration is performed in which a second operation on the first intermediate result accessed from the memory is performed, so as to avoid a third operation, that would be required if the first intermediate result were not accessed from the memory.

    System and method for configuration of an ensemble solver

    公开(公告)号:US09684865B1

    公开(公告)日:2017-06-20

    申请号:US13910467

    申请日:2013-06-05

    IPC分类号: G06N5/02 G06N99/00

    CPC分类号: G06N99/005 G06N5/003

    摘要: In a system for enabling configuration of an ensemble of several solvers, such that the ensemble can efficiently solve a constraint problem, for each one of several candidate configurations, an array of scores is computed. The array corresponds to a statistical parameter related to a problem solution, and the computation is based on, at least in part, a set of features associated with the problem. One candidate configuration is assigned to a solver, and based on the array of scores associated with that candidate configuration the same or a different candidate configuration is assigned to a another solver. A system for dynamically reconfiguring an ensemble of solvers obtains runtime data from several solvers, and a new configuration is determined by applying a machine learning and/or heuristic analysis procedure to the runtime data. The configuration of a solver may be updated according to the new configuration while that solver is running.

    Systems and methods for parallelizing and optimizing sparse tensor computations
    10.
    发明授权
    Systems and methods for parallelizing and optimizing sparse tensor computations 有权
    用于并行化和优化稀疏张量计算的系统和方法

    公开(公告)号:US09471377B2

    公开(公告)日:2016-10-18

    申请号:US14540427

    申请日:2014-11-13

    IPC分类号: G06F9/46 G06F9/48

    CPC分类号: G06F9/4881 G06F2209/483

    摘要: A scheduling system can schedule several operations for parallel execution on a number of work processors. At least one of the operations is not to be executed, and the determination of which operation or operations are not to be executed and which ones are to be executed can be made only at run time. The scheduling system partitions a subset operations that excludes the one or more operation that are not to be executed into several groups based on, at least in part, an irregularity of operations resulting from the one or more operation that are not to be executed. In addition, the partitioning is based on, at least in part, locality of data elements associated with the subset of operations to be executed or loading of the several work processors.

    摘要翻译: 调度系统可以调度多个操作以在多个工作处理器上并行执行。 不执行至少一个操作,并且仅在运行时才能进行哪个操作或操作不被执行的确定以及要执行哪些操作或操作。 调度系统至少部分地基于由不执行的一个或多个操作产生的不规则的操作来将将不被执行的一个或多个操作排除成若干组的子集操作。 此外,分区至少部分地基于与要执行或加载多个工作处理器的操作的子集相关联的数据元素的位置。