-
公开(公告)号:US20220140841A1
公开(公告)日:2022-05-05
申请号:US17085196
申请日:2020-10-30
发明人: Yasushi Negishi , Tung D. Le , Haruki Imai , Kiyokuni Kawachiya
摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.
-
公开(公告)号:US20210174190A1
公开(公告)日:2021-06-10
申请号:US16704240
申请日:2019-12-05
发明人: Gradus Janssen , Vladimir Zolotov , Tung D. Le
摘要: Processing a neural network data flow graph having a set of nodes and a set of edges. An insertion point is determined for a memory reduction or memory restoration operation. The determination is based on computing tensor timing slacks (TTS) for a set of input tensors; compiling a candidate list (SI) of input tensors, from the set of input tensors, using input tensors having corresponding TTS values larger than a threshold value (thTTS); filtering the SI to retain input tensors whose size meets a threshold value (thS); and determining an insertion point for the operation using the SI based on the filtering. A new data flow graph is generated or an existing one is modified using this process.
-
公开(公告)号:US20210019613A1
公开(公告)日:2021-01-21
申请号:US16514528
申请日:2019-07-17
发明人: Tung D. Le
IPC分类号: G06N3/08
摘要: Methods and systems for generating a program include parameterizing a high-order function to replace data with primitive functions. A neural programmer interpreter (NPI) model is trained for the high-order function. Respective neural network models are trained for each primitive function. The neural network models generate data for the NPI model when called.
-
公开(公告)号:US11521062B2
公开(公告)日:2022-12-06
申请号:US16704240
申请日:2019-12-05
发明人: Gradus Janssen , Vladimir Zolotov , Tung D. Le
摘要: Processing a neural network data flow graph having a set of nodes and a set of edges. An insertion point is determined for a memory reduction or memory restoration operation. The determination is based on computing tensor timing slacks (TTS) for a set of input tensors; compiling a candidate list (SI) of input tensors, from the set of input tensors, using input tensors having corresponding TTS values larger than a threshold value (thTTS); filtering the SI to retain input tensors whose size meets a threshold value (thS); and determining an insertion point for the operation using the SI based on the filtering. A new data flow graph is generated or an existing one is modified using this process.
-
公开(公告)号:US11362670B2
公开(公告)日:2022-06-14
申请号:US17085196
申请日:2020-10-30
发明人: Yasushi Negishi , Tung D. Le , Haruki Imai , Kiyokuni Kawachiya
摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.
-
公开(公告)号:US10558914B2
公开(公告)日:2020-02-11
申请号:US16384985
申请日:2019-04-16
发明人: Taro Sekiyama , Kiyokuni Kawachiya , Tung D. Le , Yasushi Negishi
摘要: A generated algorithm used by a neural network is captured during execution of an iteration of the neural network. A candidate algorithm is identified based on the generated algorithm. A determination is made that the candidate algorithm utilizes less memory than the generated algorithm. Based on the determination the neural network is updated by replacing the generated algorithm with the candidate algorithm.
-
公开(公告)号:US11836613B2
公开(公告)日:2023-12-05
申请号:US16514528
申请日:2019-07-17
发明人: Tung D. Le
CPC分类号: G06N3/08
摘要: Methods and systems for generating a program include parameterizing a high-order function to replace data with primitive functions. A neural programmer interpreter (NPI) model is trained for the high-order function. Respective neural network models are trained for each primitive function. The neural network models generate data for the NPI model when called.
-
公开(公告)号:US20220138580A1
公开(公告)日:2022-05-05
申请号:US17089245
申请日:2020-11-04
发明人: Haruki Imai , Tung D. Le , Yasushi Negishi , Kiyokuni Kawachiya
IPC分类号: G06N3/08
摘要: Methods and systems for training a neural network include identifying units within a neural network, including a first unit for memory swapping and a second unit for re-computation to balance memory efficiency with computational efficiency. Each unit includes at least one layer of the neural network. Each unit has a first layer that is a checkpoint operation. During a feed-forward training stage, feature maps are stored in a first memory. The feature maps are output by the at least one layer of the first unit. The feature maps are swapped from the first memory to a second memory. During a backpropagation stage, the feature maps for the first unit are swapped from the second memory to the first memory. Feature maps for the second unit are re-computed.
-
公开(公告)号:US11164079B2
公开(公告)日:2021-11-02
申请号:US15843244
申请日:2017-12-15
发明人: Tung D. Le , Haruki Imai , Taro Sekiyama , Yasushi Negishi
摘要: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.
-
公开(公告)号:US11106970B2
公开(公告)日:2021-08-31
申请号:US15815771
申请日:2017-11-17
发明人: Tung D. Le , Taro Sekiyama
摘要: In an approach to localizing tree-based convolutional neural networks, a method includes creating a first tree-based convolution layer (TBCL) corresponding to a tree, where the tree includes a first plurality of nodes and a node that has been indicated to be a first pivotal node. The first TBCL includes a second plurality of nodes and a second pivotal node having a feature vector based on node data from the first pivotal node. The method also includes creating a second TBCL corresponding to the tree. The second TBCL may include a third plurality of nodes. The method further includes determining a feature vector a third pivotal node in the third plurality of nodes based on the feature vectors from: (i) the second pivotal node, (ii) a parent node of the second pivotal node, and (iii) a child node of the second pivotal node.
-
-
-
-
-
-
-
-
-