Abstract:
Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.
Abstract:
A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.
Abstract:
Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.
Abstract:
Various disclosed embodiments are directed to methods and systems for reducing memory space in sequential computer-implemented operations. The method includes generating a directed acyclic graph (DAG) having a plurality of vertices and directed edges, wherein each edge connects a predecessor vertex to a successor vertex. Each vertex represents one of the computer-implemented operations and each directed edge represents output data generated by the operations. The method includes merging one of the predecessor vertex with one of the successor vertex by combining the operations of the predecessor vertex and the successor vertex if the predecessor and successor vertices are connected by a directed edge and there is only one directed edge originating from the predecessor vertex. The merger of the predecessor and the successor vertices reduces the number of directed edges in the DAG, resulting in a reduction of intermediate buffer memory required to store the output data.
Abstract:
A computation graph is accessed. In the computation graph, operations to be performed are represented as interior nodes, inputs to the operations are represented as leaf nodes, and a result of the operations is represented as a root. Selected sets of the operations are combined to form respective kernels of operations. Code is generated execute the kernels of operations. The code is executed to determine the result.
Abstract:
Apparatuses, systems, and techniques to modify performance of a neural network. In at least one embodiment, performance of one or more neural networks is modified based, at least in part, on a user-provided description of at least portions of the one or more neural networks.
Abstract:
Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.
Abstract:
A compiler and a method of compiling code that reduces memory bandwidth when processing code on a computer are provided herein. In one embodiment, the method includes: (1) automatically identifying a sequence of operations for fusing, wherein the sequence of operations correspond to instructions from a source code, (2) determining subdivisions of a final output of the sequence of operations, (3) determining input data and intermediate operations needed to obtain a final subdivision output for each of the subdivisions and (4) automatically generating code to fuse the sequence of operations employing the subdivisions, wherein the automatically identifying and the automatically generating are performed by a processor.