METHOD AND SYSTEM TO DETERMINE EXECUTION INEFFICIENCIES IN DATAFLOW PROGRAMS

    公开(公告)号:US20240069880A1

    公开(公告)日:2024-02-29

    申请号:US18387906

    申请日:2023-11-08

    IPC分类号: G06F8/41

    CPC分类号: G06F8/433

    摘要: In a method a computer-implemented efficiency analyzer selects operators from an intermediate representation of a dataflow program. The operators are included in a mapping of the operators to hardware of a computing system to execute the dataflow program. Based on the mapping and a description of the hardware, the efficiency analyzer computes an execution metric associated with executing the operators on the hardware. Based on the execution metric and hardware description, the efficiency analyzer determines an inefficiency metric, and based on the inefficiency metric, the efficiency analyzer determines an inefficiency associated with the dataflow program. The computing system to execute the dataflow program can comprise a coarse grain computing system and the hardware can include a reconfigurable processor of the computing system. A computer program product and a computing system to a the dataflow program can implement the method.

    Computationally Efficient Softmax Loss Gradient Backpropagation

    公开(公告)号:US20210216873A1

    公开(公告)日:2021-07-15

    申请号:US16744077

    申请日:2020-01-15

    发明人: Chen LIU

    IPC分类号: G06N3/08 G06N3/063 G06N3/04

    摘要: A computation unit comprises first, second, and third circuits. The first circuit traverses gradient loss elements gpn and normalized output elements pn and produces an accumulation C. The accumulation C is produced by element-wise multiplying the gradient loss elements gpn with the corresponding normalized output elements pn and summing the results of the element-wise multiplication. The second circuit, operatively coupled to the first circuit, element-wise subtracts the accumulation C from each of the gradient loss elements gpn and produces modulated gradient loss elements gpn′. The third circuit, operatively coupled to the second circuit, traverses the modulated gradient loss elements gpn′ and produces gradient loss elements gxn for a function preceding the softmax function. The gradient loss elements gxn are produced by element-wise multiplying the modulated gradient loss elements gpn′ with the corresponding normalized output elements pn.

    ANALYSIS ASSISTANT FOR DETERMINING EXECUTION INEFFICIENCIES IN DATAFLOW PROGRAMS

    公开(公告)号:US20240078098A1

    公开(公告)日:2024-03-07

    申请号:US18387912

    申请日:2023-11-08

    IPC分类号: G06F8/41

    CPC分类号: G06F8/433

    摘要: In a method, in response to an interface a computer-implemented analysis assistant initiates a presentation of inefficiency results, determined an efficiency analyzer based on a mapping of a dataflow program to execute on hardware of a computing system. The assistant receives an inefficiency included among the inefficiency results and composes formatted inefficiency results comprising a presentation format of the inefficiency to assist a developer of the dataflow program to interpret the inefficiency. The analysis assistant outputs the formatted inefficiency results to an interface, which can comprise an interface to output the formatted inefficiency results for use by the developer to improve the dataflow program in association with the inefficiency. In implementations the presentation can comprise an interactive presentation with a developer of the dataflow program. A computer program product and a computing system can implement the method. The computing system can execute the assistant, and can include the interfaces.

    OPTIMIZING TENSOR TILING IN NEURAL NETWORKS BASED ON A TILING COST MODEL

    公开(公告)号:US20230315410A1

    公开(公告)日:2023-10-05

    申请号:US18129714

    申请日:2023-03-31

    IPC分类号: G06F8/41

    CPC分类号: G06F8/443

    摘要: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.

    TENSOR CHECKPOINT OPTIMIZATION IN DATAFLOW COMPUTING APPLICATIONS

    公开(公告)号:US20230315407A1

    公开(公告)日:2023-10-05

    申请号:US18129722

    申请日:2023-03-31

    IPC分类号: G06F8/41

    CPC分类号: G06F8/433

    摘要: According to a computing method a compiler determines a recompute node included in a dataflow application and a checkpoint tensor produced by the recompute node. The compiler determines a recompute cost to recompute the checkpoint tensor, and a memory cost to checkpoint the checkpoint tensor in a memory. Based on the recompute cost and/or the memory cost, the compiler determines a solution cost and compares the solution cost to a solution threshold. Based on comparing the solution cost to the solution threshold, the compiler determines a checkpoint solution to execute the dataflow application. The checkpoint solution can comprise recomputing or checkpointing the checkpoint tensor. In some implementations, the compiler can determine a recompute ratio of the recompute cost to the memory cost and can compare the recompute ratio to the solution threshold. A computer program product and a computing system can implement aspects of the method.