-
公开(公告)号:US20240135162A1
公开(公告)日:2024-04-25
申请号:US17971410
申请日:2022-10-20
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: CONG XU , SUPARNA BHATTACHARYA , RYAN BEETHE , MARTIN FOLTIN
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Systems and methods are configured to provide lifetime data valuations for a dataset that evolves across multiple machine learning training tasks by providing and updating path-dependent data valuations for data points in the dataset during each training task. A current machine learning training task may include splitting the dataset into multiple random mini-epochs and training the current machine learning model using a first random mini-epoch and an accuracy mini-epoch, which consists of high value data points from the path-dependent data valuations. The random and accuracy mini-epochs can be, during the training, iterated for a number of times during the training, while a second random mini-epoch is prefetch. During the training, the path-dependent data valuations can be updated based on data valuations during the current training and a similarity between the current machine learning model and prior trained machine learning models.
-
公开(公告)号:US20240232607A9
公开(公告)日:2024-07-11
申请号:US17971410
申请日:2022-10-21
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: CONG XU , SUPARNA BHATTACHARYA , RYAN BEETHE , MARTIN FOLTIN
IPC: G06N3/08
CPC classification number: G06N3/08
Abstract: Systems and methods are configured to provide lifetime data valuations for a dataset that evolves across multiple machine learning training tasks by providing and updating path-dependent data valuations for data points in the dataset during each training task. A current machine learning training task may include splitting the dataset into multiple random mini-epochs and training the current machine learning model using a first random mini-epoch and an accuracy mini-epoch, which consists of high value data points from the path-dependent data valuations. The random and accuracy mini-epochs can be, during the training, iterated for a number of times during the training, while a second random mini-epoch is prefetch. During the training, the path-dependent data valuations can be updated based on data valuations during the current training and a similarity between the current machine learning model and prior trained machine learning models.
-
公开(公告)号:US20230409587A1
公开(公告)日:2023-12-21
申请号:US18351355
申请日:2023-07-12
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: TED DUNNING , SUPARNA BHATTACHARYA , GLYN BOWDEN , LIN A. NEASE , JANICE M. ZDANKUS , SONU SUDHAKARAN
IPC: G06F16/248 , G06F16/2455 , G06F16/25 , G06F16/28
CPC classification number: G06F16/248 , G06F16/288 , G06F16/254 , G06F16/24556
Abstract: Systems and methods provide a system that gathers information about data as it progresses through data processing pipelines of data analysis projects. The data analytics system derives value indicators and implicit metadata from the data processing pipelines. For example, the data analytics system may derive value indicators and implicit metadata from data-related products themselves, semantic analysis of the code/processing steps used to process the data-related products, the structure of data processing pipelines, and human behavior related to production and usage of data-related products. Once a new data analysis project is initiated, the data analytics system gathers parameters and characteristics about the new data analysis project and references the value indicators and implicit metadata to recommend useful processing steps, datasets, and/or other data-related products for the new data analysis project.
-
公开(公告)号:US20230409585A1
公开(公告)日:2023-12-21
申请号:US17843757
申请日:2022-06-17
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: TED DUNNING , SUPARNA BHATTACHARYA , GLYN BOWDEN , LIN A. NEASE , JANICE M. ZDANKUS , SONU SUDHAKARAN
IPC: G06F16/248 , G06F16/2455 , G06F16/28 , G06F16/25
CPC classification number: G06F16/248 , G06F16/24556 , G06F16/288 , G06F16/254
Abstract: Systems and methods provide a system that gathers information about data as it progresses through data processing pipelines of data analysis projects. The data analytics system derives value indicators and implicit metadata from the data processing pipelines. For example, the data analytics system may derive value indicators and implicit metadata from data-related products themselves, semantic analysis of the code/processing steps used to process the data-related products, the structure of data processing pipelines, and human behavior related to production and usage of data-related products. Once a new data analysis project is initiated, the data analytics system gathers parameters and characteristics about the new data analysis project and references the value indicators and implicit metadata to recommend useful processing steps, datasets, and/or other data-related products for the new data analysis project.
-
-
-