Feature sets using semi-structured data storage

    公开(公告)号:US11775544B2

    公开(公告)日:2023-10-03

    申请号:US18162522

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/25 G06F16/2282 G06F16/24558 G06F16/86

    Abstract: The subject technology receives by a database system, raw input data from a source table provided by an external environment, the source table comprising multiple rows and multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the external environment comprising an external system from the database system and is accessed by different users. The subject technology generates cell data for a second table based on the values from the source table. The subject technology performs a database operation to generate the second table including table metadata, column metadata, and the generated cell data.

    MACHINE LEARNING USING SECURED SHARED DATA
    3.
    发明公开

    公开(公告)号:US20230186160A1

    公开(公告)日:2023-06-15

    申请号:US18055248

    申请日:2022-11-14

    Applicant: Snowflake Inc.

    Abstract: Disclosed are systems, methods, and non-transitory computer-readable media for sharing, on a distributed database, a database application to a first user of the distributed database, the database application generated by a second user of the distributed database. The training dataset includes a first database training dataset from the first user of the distributed database and a second database training dataset from the second user of the distributed database, the first database training dataset and the second database training dataset including non-overlapping dataset features. The database application further identifies a query from the second user to train the machine learning model on the training dataset and generates a trained machine learning model by training the machine learning model on a joined dataset according to the query. The database application generates outputs from the trained machine learning model by applying the trained machine learning model on new data.

    Processing functionality to store sparse feature sets

    公开(公告)号:US12204553B2

    公开(公告)日:2025-01-21

    申请号:US18458425

    申请日:2023-08-30

    Applicant: Snowflake Inc.

    Abstract: The subject technology generates, by a database system, cell data for a particular table based on values from a source table, the values being based on raw input data, the source table comprising multiple rows and multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the source table being provided by an external environment, the external environment comprising an external system from the database system. The subject technology performs a database operation to generate the particular table including table metadata, column metadata, and the generated cell data, the generated particular table comprising a second format that causes more efficient processing of data by the database system using a single query on the particular table compared to processing the raw input data from the source table.

    FEATURE SETS USING SEMI-STRUCTURED DATA STORAGE

    公开(公告)号:US20230177063A1

    公开(公告)日:2023-06-08

    申请号:US18162522

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/25 G06F16/24558 G06F16/86 G06F16/2282

    Abstract: The subject technology receives by a database system, raw input data from a source table provided by an external environment, the source table comprising multiple rows and multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the external environment comprising an external system from the database system and is accessed by different users. The subject technology generates cell data for a second table based on the values from the source table. The subject technology performs a database operation to generate the second table including table metadata, column metadata, and the generated cell data.

    Storing feature sets using semi-structured data storage

    公开(公告)号:US11609927B2

    公开(公告)日:2023-03-21

    申请号:US17899160

    申请日:2022-08-30

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives, by a database system, raw input data from a source table provided by a machine learning development environment, the source table comprising multiple rows where each row includes multiple columns, the raw input data comprising values in a first format, the values comprising input features corresponding to datasets included in the raw input data for machine learning models, the machine learning development environment comprising an external system from the database system and is accessed by a plurality of different users that are external to the database system. The subject technology generates cell data for a feature store table based at least in part on the values from the source table. The subject technology performs at least one database operation to generate the feature store table including at least table metadata, column metadata, and the generated cell data.

    Machine learning using secured shared data

    公开(公告)号:US11893462B2

    公开(公告)日:2024-02-06

    申请号:US18055248

    申请日:2022-11-14

    Applicant: Snowflake Inc.

    Abstract: Disclosed are systems, methods, and non-transitory computer-readable media for sharing, on a distributed database, a database application to a first user of the distributed database, the database application generated by a second user of the distributed database. The training dataset includes a first database training dataset from the first user of the distributed database and a second database training dataset from the second user of the distributed database, the first database training dataset and the second database training dataset including non-overlapping dataset features. The database application further identifies a query from the second user to train the machine learning model on the training dataset and generates a trained machine learning model by training the machine learning model on a joined dataset according to the query. The database application generates outputs from the trained machine learning model by applying the trained machine learning model on new data.

Patent Agency Ranking