DATA-AWARE STORAGE TIERING AND LIFETIME DATA VALUATION FOR DEEP LEARNING

    公开(公告)号:US20240135162A1

    公开(公告)日:2024-04-25

    申请号:US17971410

    申请日:2022-10-20

    CPC classification number: G06N3/08

    Abstract: Systems and methods are configured to provide lifetime data valuations for a dataset that evolves across multiple machine learning training tasks by providing and updating path-dependent data valuations for data points in the dataset during each training task. A current machine learning training task may include splitting the dataset into multiple random mini-epochs and training the current machine learning model using a first random mini-epoch and an accuracy mini-epoch, which consists of high value data points from the path-dependent data valuations. The random and accuracy mini-epochs can be, during the training, iterated for a number of times during the training, while a second random mini-epoch is prefetch. During the training, the path-dependent data valuations can be updated based on data valuations during the current training and a similarity between the current machine learning model and prior trained machine learning models.

    DATA-AWARE STORAGE TIERING AND LIFETIME DATA VALUATION FOR DEEP LEARNING

    公开(公告)号:US20240232607A9

    公开(公告)日:2024-07-11

    申请号:US17971410

    申请日:2022-10-21

    CPC classification number: G06N3/08

    Abstract: Systems and methods are configured to provide lifetime data valuations for a dataset that evolves across multiple machine learning training tasks by providing and updating path-dependent data valuations for data points in the dataset during each training task. A current machine learning training task may include splitting the dataset into multiple random mini-epochs and training the current machine learning model using a first random mini-epoch and an accuracy mini-epoch, which consists of high value data points from the path-dependent data valuations. The random and accuracy mini-epochs can be, during the training, iterated for a number of times during the training, while a second random mini-epoch is prefetch. During the training, the path-dependent data valuations can be updated based on data valuations during the current training and a similarity between the current machine learning model and prior trained machine learning models.

Patent Agency Ranking