SYSTEMS AND METHODS FOR DATA-AWARE STORAGE TIERING FOR DEEP LEARNING

    公开(公告)号:US20220327376A1

    公开(公告)日:2022-10-13

    申请号:US17226917

    申请日:2021-04-09

    Abstract: Systems and methods are configured to split an epoch associated with a training dataset into a plurality of mini-epochs. A machine learning model can be trained with a mini-epoch of the plurality of mini-epochs. The mini-epoch can be, during the training, iterated for a number of times during the training. One or more metrics reflective of at least one of: a training loss, training accuracy, or validation accuracy of the machine learning model associated with the mini-epoch can be received. Whether to terminate iterations of the mini-epoch early before a number of iterations of the mini-epoch reaches the number of times based on the one or more metrics can be determined. The number of iterations can be a non-zero number.

    Sparsifying neural network models

    公开(公告)号:US11645529B2

    公开(公告)日:2023-05-09

    申请号:US15967835

    申请日:2018-05-01

    CPC classification number: G06N3/082 G06N3/04 G06N3/063

    Abstract: A technique includes modifying a neural network model to sparsify the model. The model includes a plurality of kernel element weights, which are parameterized according to a plurality of dimensions. Modifying the model includes, in a given iteration of the plurality of iterations, training the model based on a structure regularization in which kernel element weights that share a dimension in common are removed as a group to create corresponding zero kernel elements in the model; and compressing the model to exclude zero kernel element weights from the model to prepare the model to be trained in another iteration.

    SYSTEMS AND METHODS FOR INTELLIGENT DATA SHUFFLING FOR HIGH-PERFORMANCE DISTRIBUTED MACHINE LEARNING TRAINING

    公开(公告)号:US20220067577A1

    公开(公告)日:2022-03-03

    申请号:US17010744

    申请日:2020-09-02

    Abstract: Systems and methods are provided for data shuffling for distributed machine learning training, including each training node in the network receiving a shard of training data, wherein the training data set is divided into shards having data items. Each data item is assigned to a working set such that each of the working set includes data items from multiple shards. The training nodes perform training using the data items of a first working set that are in each node's shard. Upon completion of the training using the data items of the first working set, the training nodes performing training using the data items of a second working set that are in their shards; and while the training nodes are performing training on their respective subsets of shards of the second working set, the nodes randomly shuffling data items in the first working set to create a shuffled first working set.

Patent Agency Ranking