Computation modification by amplification of stencil including stencil points

    公开(公告)号:US11567746B2

    公开(公告)日:2023-01-31

    申请号:US16927016

    申请日:2020-07-13

    Abstract: In a sequence of major computational steps or in an iterative computation, a stencil amplifier can increase the number of data elements accessed from one or more data structures in a single major step or iteration, thereby decreasing the total number of computations and/or communication operations in the overall sequence or the iterative computation. Stencil amplification, which can be optimized according to a specified parameter such as compile time, rune time, code size, etc., can improve the performance of a computing system executing the sequence or the iterative computation in terms of run time, memory load, energy consumption, etc. The stencil amplifier typically determines boundaries, to avoid erroneously accessing data elements not present in the one or more data structures.

    Systems and methods for scalable hierarchical polyhedral compilation

    公开(公告)号:US11537373B2

    公开(公告)日:2022-12-27

    申请号:US17034895

    申请日:2020-09-28

    Abstract: A system for compiling programs for execution thereof using a hierarchical processing system having two or more levels of memory hierarchy can perform memory-level-specific optimizations, without exceeding a specified maximum compilation time. To this end, the compiler system employs a polyhedral model and limits the dimensions of a polyhedral program representation that is processed by the compiler at each level using a focalization operator that temporarily reduces one or more dimensions of the polyhedral representation. Semantic correctness is provided via a defocalization operator that can restore all polyhedral dimensions that had been temporarily removed.

    Efficient and scalable storage of sparse tensors

    公开(公告)号:US11573945B1

    公开(公告)日:2023-02-07

    申请号:US17033592

    申请日:2020-09-25

    Abstract: In a system for storing in memory a tensor that includes at least three modes, elements of the tensor are stored in a mode-based order for improving locality of references when the elements are accessed during an operation on the tensor. To facilitate efficient data reuse in a tensor transform that includes several iterations, on a tensor that includes at least three modes, a system performs a first iteration that includes a first operation on the tensor to obtain a first intermediate result, and the first intermediate result includes a first intermediate-tensor. The first intermediate result is stored in memory, and a second iteration is performed in which a second operation on the first intermediate result accessed from the memory is performed, so as to avoid a third operation, that would be required if the first intermediate result were not accessed from the memory.

Patent Agency Ranking