Transformation For Machine Learning Pre-Processing

    公开(公告)号:US20240202589A1

    公开(公告)日:2024-06-20

    申请号:US18415212

    申请日:2024-01-17

    申请人: Google LLC

    摘要: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transformation for machine learning pre-processing. In some implementations, an instruction to create a model is obtained. A determination is made whether the instruction specifies a transform. In response to determining that the instruction specifies a transform, a determination is made as to whether the transform requires statistics on the training data. The training data is accessed. In response to determining that the transform requires statistics on the training data, transformed training data is generated from both the training data and the statistics. A model is generated with the transformed training data. A representation of the transform and the statistics is stored as metadata for the model.

    Principal Component Analysis
    5.
    发明申请

    公开(公告)号:US20230045139A1

    公开(公告)日:2023-02-09

    申请号:US17816288

    申请日:2022-07-29

    申请人: Google LLC

    IPC分类号: G06K9/62 G06F16/242

    摘要: A method for principal component analysis includes receiving a principal component analysis (PCA) request from a user requesting data processing hardware to perform PCA on a dataset, the dataset including a plurality of input features. The method further includes training a PCA model on the plurality of input features of the dataset. The method includes determining, using the trained PCA model, one or more principal components of the dataset. The method also includes generating, based on the plurality of input features and the one or more principal components, one or more embedded features of the dataset. The method includes returning the one or more embedded features to the user.

    Machine Learning Super Large-Scale Time-series Forecasting

    公开(公告)号:US20230274180A1

    公开(公告)日:2023-08-31

    申请号:US17652863

    申请日:2022-02-28

    申请人: Google LLC

    IPC分类号: G06N20/00 G06F16/248

    CPC分类号: G06N20/00 G06F16/248

    摘要: A method for forecasting time-series data, when executed by data processing hardware, causes the data processing hardware to perform operations including receiving a time series forecasting query from a user requesting a time series forecast forecasting future data based on a set of current time-series data. The operations include obtaining, from the set of current time-series data, a set of training data. The operations include training, using a first portion of the set of training data, a first sub-model of a forecasting model and training, using a second portion of the set of training data, a second sub-model of the forecasting model. The second portion is different than the first portion. The operations include forecasting, using the forecasting model, the future data based on the set of current time-series data and returning, to the user, the forecasted future data for the time series forecast.