METHOD AND SYSTEM FOR GENERATING INTERMEDIATE REPRESENTATION FOR PROGRAM FOR EXECUTION ON ACCELERATOR

    公开(公告)号:US20240103877A1

    公开(公告)日:2024-03-28

    申请号:US18533041

    申请日:2023-12-07

    IPC分类号: G06F9/38

    CPC分类号: G06F9/3836

    摘要: A method for generating an intermediate representation for a program for execution on an accelerator is executed by one or more processors, and includes hooking information on instruction from a program, determining whether the hooked information on instruction is associated with an accelerator, if it is determined that the information on instruction is associated with the accelerator, generating a first intermediate representation for the instruction using information on input and output data and information on instruction included in the instruction, and generating a second intermediate representation for the program for one or more accelerators using the first intermediate representation, and the first intermediate representation and the second intermediate representation include a plurality of data nodes, one or more operation nodes, and a plurality of edges indicating an input and output relationship between the plurality of data nodes and the one or more operation nodes.

    METHOD AND SYSTEM FOR CREATING INTERMEDIATE REPRESENTATION

    公开(公告)号:US20240126790A1

    公开(公告)日:2024-04-18

    申请号:US18537683

    申请日:2023-12-12

    IPC分类号: G06F16/28 G06F16/23 G06F16/25

    摘要: A method for creating an intermediate representation is performed by one or more processors and includes by an intermediate representation creation unit, extracting, from the program, information on data for input and output and information on operation, by the intermediate representation creation unit, determining the presence or absence of an in-place operation based on the extracted information on data and the extracted information on operation, and by the intermediate representation creation unit, if there is the in-place operation, creating an intermediate representation using the extracted information on data, the extracted information on operation, and a creation rule associated with the in-place operation, in which input data of the in-place operation is data that is replaced with output data after the in-place operation.

    METHOD FOR GPU MEMORY MANAGEMENT FOR DEEP NEURAL NETWORK AND COMPUTING DEVICE FOR PERFORMING SAME

    公开(公告)号:US20210064997A1

    公开(公告)日:2021-03-04

    申请号:US16961073

    申请日:2018-11-29

    IPC分类号: G06N3/08 G06T1/20 G06F9/50

    摘要: Embodiments disclosed herein relate to a method for GPU memory management that observes the deep learning of a deep neural network performed by a GPU and reduces the amount of GPU memory used, thereby overcoming limitations attributable to the memory size of the GPU and allowing the more effective performance of the deep learning, and a computing device for performing the same. According to an embodiment, there is disclosed a method for GPU memory management for a deep neural network, the method being performed by a computing device including a GPU and a CPU, the method including: generating a schedule for GPU memory management based on the processing of a unit operation, included in the deep neural network, by the GPU; and moving data required for deep learning of the deep neural network between GPU memory and CPU memory based on the schedule.

    METHOD OF TRANSFERRING DATA IN PARALLEL SYSTEM, AND PARALLEL SYSTEM FOR PERFORMING THE SAME

    公开(公告)号:US20180203617A1

    公开(公告)日:2018-07-19

    申请号:US15874322

    申请日:2018-01-18

    发明人: Jaejin LEE Gangwon JO

    IPC分类号: G06F3/06

    摘要: Disclosed herein are a method of transferring data in a parallel system including a main device and at least one accelerator, and a parallel system for performing the method. The method of transferring data in a heterogeneous system including a main device and at least one accelerator includes: turning off a write permission for a first main memory area corresponding to a first accelerator memory area where input data for a computation task is stored; performing the computation task by using the at least one accelerator; and turning off a read permission for a second main memory area corresponding to a second accelerator memory area where output data for the computation task is stored, in the state in which data of the second accelerator memory area has not been transferred to the second main memory area.