Multi-party machine learning using a database cleanroom

    公开(公告)号:US12020128B2

    公开(公告)日:2024-06-25

    申请号:US18162695

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06N20/00

    Abstract: A method includes installing, in a consumer database account, a shared-instance database that includes a shared instance of a provider-account database that resides in a provider database account. The shared-instance database includes a first schema that includes provider-account training data, provider-account scoring data, a training function, and a scoring function. The method also includes invoking the training function from the consumer database account, which results in creation in the consumer database account of a second schema that includes a machine-learning-model instance of a machine learning model, and which also results in training the machine-learning model instance with at least the provider-account training data. Additionally, the method includes generating consumer-account scoring data by inputting, into the trained machine-learning-model instance, consumer-account input data that is stored in the consumer database account. The method also includes storing the consumer-account scoring data in the consumer database account.

    Machine learning using secured shared data

    公开(公告)号:US11893462B2

    公开(公告)日:2024-02-06

    申请号:US18055248

    申请日:2022-11-14

    Applicant: Snowflake Inc.

    Abstract: Disclosed are systems, methods, and non-transitory computer-readable media for sharing, on a distributed database, a database application to a first user of the distributed database, the database application generated by a second user of the distributed database. The training dataset includes a first database training dataset from the first user of the distributed database and a second database training dataset from the second user of the distributed database, the first database training dataset and the second database training dataset including non-overlapping dataset features. The database application further identifies a query from the second user to train the machine learning model on the training dataset and generates a trained machine learning model by training the machine learning model on a joined dataset according to the query. The database application generates outputs from the trained machine learning model by applying the trained machine learning model on new data.

    MULTI-PARTY MACHINE LEARNING USING A DATABASE CLEANROOM

    公开(公告)号:US20230409968A1

    公开(公告)日:2023-12-21

    申请号:US18162695

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06N20/00

    Abstract: A method includes installing, in a consumer database account, a shared-instance database that includes a shared instance of a provider-account database that resides in a provider database account. The shared-instance database includes a first schema that includes provider-account training data, provider-account scoring data, a training function, and a scoring function. The method also includes invoking the training function from the consumer database account, which results in creation in the consumer database account of a second schema that includes a machine-learning-model instance of a machine learning model, and which also results in training the machine-learning model instance with at least the provider-account training data. Additionally, the method includes generating consumer-account scoring data by inputting, into the trained machine-learning-model instance, consumer-account input data that is stored in the consumer database account. The method also includes storing the consumer-account scoring data in the consumer database account.

    OVERLAP RESULTS DATA GENERATION ON A CLOUD DATA PLATFORM

    公开(公告)号:US20230385286A1

    公开(公告)日:2023-11-30

    申请号:US18162688

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24568 G06F16/24564 G06F16/244 G06F16/2456

    Abstract: A system for generating similarity data for different datasets in a cloud data platform. A first dataset of a plurality of datasets on the cloud data platform is identified, where the first dataset is associated with a first user of the cloud data platform. A semantic type for each feature the first dataset is identified, and each semantic type for the first dataset is compared with existing data of the first user. Semantic types for each feature of each dataset are identified, and each semantic type for the first dataset is compared to each semantic type of each dataset. Overlap requests are generated to output overlap datasets between the first dataset and each of the plurality of datasets. A results dataset is generated by applying the overlap requests to a joined dataset comprising data from the first dataset and data from each of the plurality of datasets.

    QUERY PROCESSING USING DATA CLEAN ROOMS
    99.
    发明公开

    公开(公告)号:US20230169213A1

    公开(公告)日:2023-06-01

    申请号:US18162701

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F21/6254 G06F16/245 G06F16/27

    Abstract: A distributed database generates a cross reference table that cross references a first dataset from a first database account and a second dataset from a second account. The distributed database receives a query directed to a combination of the first and second datasets, and generates an interim table in the first database account by applying the query to the cross reference table and the first dataset. The distributed database generates results data in the second database account by applying the query to the interim table and the second dataset, and stores the results data in the first database account.

Patent Agency Ranking