-
公开(公告)号:US20230351190A1
公开(公告)日:2023-11-02
申请号:US18219555
申请日:2023-07-07
Applicant: Google LLC
Inventor: Gaurav Mishra , Adam Joseph Roberts , Noam M. Shazeer, JR. , Maarten Paul Bosma
IPC: G06N3/084
CPC classification number: G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model using a deterministic data pipeline. One of the methods may include receiving a first request to generate a deterministic training dataset: transforming raw training examples obtained from the raw data source into pre-processed training examples; assigning a unique index to each pre-processed training example; and caching the pre-processed training examples into the cache directory specified in the received first request; receiving a second request to use the deterministic training dataset to train a machine learning model, the second request specifying a start index; and in response to receiving the second request: reading, from the cache directory, the pre-processed training examples that have indices beginning from the start index; and providing the read training examples in an order of the assigned indices for use in training the machine learning model.
-
公开(公告)号:US20230316082A1
公开(公告)日:2023-10-05
申请号:US18130339
申请日:2023-04-03
Applicant: Google LLC
Inventor: Gaurav Mishra , Adam Joseph Roberts , Noam M. Shazeer, JR. , Maarten Paul Bosma
IPC: G06N3/084
CPC classification number: G06N3/084
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model using a deterministic data pipeline. One of the methods may include receiving a first request to generate a deterministic training dataset: transforming raw training examples obtained from the raw data source into pre-processed training examples; assigning a unique index to each pre-processed training example; and caching the pre-processed training examples into the cache directory specified in the received first request; receiving a second request to use the deterministic training dataset to train a machine learning model, the second request specifying a start index; and in response to receiving the second request: reading, from the cache directory, the pre-processed training examples that have indices beginning from the start index; and providing the read training examples in an order of the assigned indices for use in training the machine learning model.
-