ReLU COMPRESSION TO REDUCE GPU MEMORY

    公开(公告)号:US20220140841A1

    公开(公告)日:2022-05-05

    申请号:US17085196

    申请日:2020-10-30

    摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

    NEURAL NETWORK TRAINING USING A DATA FLOW GRAPH AND DYNAMIC MEMORY MANAGEMENT

    公开(公告)号:US20210174190A1

    公开(公告)日:2021-06-10

    申请号:US16704240

    申请日:2019-12-05

    摘要: Processing a neural network data flow graph having a set of nodes and a set of edges. An insertion point is determined for a memory reduction or memory restoration operation. The determination is based on computing tensor timing slacks (TTS) for a set of input tensors; compiling a candidate list (SI) of input tensors, from the set of input tensors, using input tensors having corresponding TTS values larger than a threshold value (thTTS); filtering the SI to retain input tensors whose size meets a threshold value (thS); and determining an insertion point for the operation using the SI based on the filtering. A new data flow graph is generated or an existing one is modified using this process.

    NEURAL PROGRAMMER INTERPRETERS WITH MODELED PRIMITIVES

    公开(公告)号:US20210019613A1

    公开(公告)日:2021-01-21

    申请号:US16514528

    申请日:2019-07-17

    发明人: Tung D. Le

    IPC分类号: G06N3/08

    摘要: Methods and systems for generating a program include parameterizing a high-order function to replace data with primitive functions. A neural programmer interpreter (NPI) model is trained for the high-order function. Respective neural network models are trained for each primitive function. The neural network models generate data for the NPI model when called.

    ReLU compression to reduce GPU memory

    公开(公告)号:US11362670B2

    公开(公告)日:2022-06-14

    申请号:US17085196

    申请日:2020-10-30

    摘要: A method is presented for compressing data of a Rectified Linear Unit (ReLU) function on a graphical processing unit (GPU) employed in a learning process of a deep neural network. The method includes converting an initial data structure including nonzero data and zero data into a compressed data structure including only the nonzero data of the initial data structure as compressed data by generating a nonzero data bitmap region, generating a nonzero data number table region by employing a parallel reduction algorithm, calculating a nonzero data array index per block region of all blocks from the nonzero data number table region by employing a parallel prefix sum scan algorithm, allocating a buffer for the compressed data; and copying the nonzero data from the initial data structure into a nonzero data array region in a compressed data format in parallel.

    DATA SWAPPING FOR NEURAL NETWORK MEMORY CONSERVATION

    公开(公告)号:US20220138580A1

    公开(公告)日:2022-05-05

    申请号:US17089245

    申请日:2020-11-04

    IPC分类号: G06N3/08

    摘要: Methods and systems for training a neural network include identifying units within a neural network, including a first unit for memory swapping and a second unit for re-computation to balance memory efficiency with computational efficiency. Each unit includes at least one layer of the neural network. Each unit has a first layer that is a checkpoint operation. During a feed-forward training stage, feature maps are stored in a first memory. The feature maps are output by the at least one layer of the first unit. The feature maps are swapped from the first memory to a second memory. During a backpropagation stage, the feature maps for the first unit are swapped from the second memory to the first memory. Feature maps for the second unit are re-computed.

    Multi-GPU deep learning using CPUs

    公开(公告)号:US11164079B2

    公开(公告)日:2021-11-02

    申请号:US15843244

    申请日:2017-12-15

    IPC分类号: G06N3/08 G06T1/20 G06N3/04

    摘要: A computer-implemented method, computer program product, and computer processing system are provided for accelerating neural network data parallel training in multiple graphics processing units (GPUs) using at least one central processing unit (CPU). The method includes forming a set of chunks. Each of the chunks includes a respective group of neural network layers other than a last layer. The method further includes performing one or more chunk-wise synchronization operations during a backward phase of the neural network data parallel training, by each of the multiple GPUs and the at least one CPU.

    Localizing tree-based convolutional neural networks

    公开(公告)号:US11106970B2

    公开(公告)日:2021-08-31

    申请号:US15815771

    申请日:2017-11-17

    IPC分类号: G06N3/04 G06F8/41 G06F9/451

    摘要: In an approach to localizing tree-based convolutional neural networks, a method includes creating a first tree-based convolution layer (TBCL) corresponding to a tree, where the tree includes a first plurality of nodes and a node that has been indicated to be a first pivotal node. The first TBCL includes a second plurality of nodes and a second pivotal node having a feature vector based on node data from the first pivotal node. The method also includes creating a second TBCL corresponding to the tree. The second TBCL may include a third plurality of nodes. The method further includes determining a feature vector a third pivotal node in the third plurality of nodes based on the feature vectors from: (i) the second pivotal node, (ii) a parent node of the second pivotal node, and (iii) a child node of the second pivotal node.