-
公开(公告)号:US12039415B2
公开(公告)日:2024-07-16
申请号:US16588913
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Andrea Olgiati , Lakshmi Naarayanan Ramakrishnan , Jeffrey John Geevarghese , Denis Davydenko , Vikas Kumar , Rahul Raghavendra Huilgol , Amol Ashok Lele , Stefano Stefani , Vladimir Zhukov
Abstract: Methods, systems, and computer-readable media for debugging and profiling of machine learning model training are disclosed. A machine learning analysis system receives data associated with training of a machine learning model. The data was collected by a machine learning training cluster. The machine learning analysis system performs analysis of the data associated with the training of the machine learning model. The machine learning analysis system detects one or more conditions associated with the training of the machine learning model based at least in part on the analysis. The machine learning analysis system generates one or more alarms describing the one or more conditions associated with the training of the machine learning model.
-
公开(公告)号:US20210097431A1
公开(公告)日:2021-04-01
申请号:US16588913
申请日:2019-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Andrea Olgiati , Lakshmi Naarayanan Ramakrishnan , Jeffrey John Geevarghese , Denis Davydenko , Vikas Kumar , Rahul Raghavendra Huilgol , Amol Ashok Lele , Stefano Stefani , Vladimir Zhukov
Abstract: Methods, systems, and computer-readable media for debugging and profiling of machine learning model training are disclosed. A machine learning analysis system receives data associated with training of a machine learning model. The data was collected by a machine learning training cluster. The machine learning analysis system performs analysis of the data associated with the training of the machine learning model. The machine learning analysis system detects one or more conditions associated with the training of the machine learning model based at least in part on the analysis. The machine learning analysis system generates one or more alarms describing the one or more conditions associated with the training of the machine learning model.
-