-
公开(公告)号:US11704566B2
公开(公告)日:2023-07-18
申请号:US16446924
申请日:2019-06-20
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yiming Ma , Menglin L. Brown , Bee-Chung Chen , Sheng Wu , Jun Jia , Bo Long
IPC: G06N3/00 , G06N3/082 , G06N20/20 , G06F11/34 , G06F18/214
CPC classification number: G06N3/082 , G06F11/3495 , G06F18/214 , G06N20/20
Abstract: The disclosed embodiments provide a system for processing data. During operation, the system obtains a training dataset containing a first set of records associated with a first set of identifier (ID) values and an evaluation dataset containing a second set of records associated with a second set of ID values. Next, the system selects a random subset of ID values from the second set of ID values. The system then generates a sampled evaluation dataset comprising a first subset of records associated with the random subset of ID values in the second set of records. The system also generates a sampled training dataset comprising a second subset of records associated with the random subset of ID values in the first set of records. Finally, the system outputs the sampled training dataset and the sampled evaluation dataset for use in training and evaluating a machine learning model.