Data-driven task-execution scheduling using machine learning

    公开(公告)号:US11755576B1

    公开(公告)日:2023-09-12

    申请号:US18104256

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24542 G06F16/27

    Abstract: A system for improving task scheduling on a cloud data platform is provided. A task is received, from a user of a cloud data platform, for execution on a dataset of a cloud data platform using a plurality of resources. A task graph is generated, and metadata related to the dataset is accessed for use in execution of the task. A predicted resource profile is generated by applying a first machine learning scheme to the task graph and the metadata of the dataset. Assignment data is generated to execute processes of the task on the plurality of resources. The assignment data generated by applying a second machine learning scheme to current state data of a current computational state of the plurality of resources and the predicted resource profile generated by the first machine learning scheme.

    Privacy-preserving multi-party machine learning using a database cleanroom

    公开(公告)号:US11651287B1

    公开(公告)日:2023-05-16

    申请号:US17816421

    申请日:2022-07-31

    Applicant: Snowflake Inc.

    CPC classification number: G06N20/00

    Abstract: Embodiments of the present disclosure may provide a data sharing system implemented as a local application in a consumer database of a distributed database. The local application can include a training function and a scoring function to train a machine learning model on provider and consumer data, and generate output data by applying the trained machine learning model on input data. The input data can include data portions from a consumer database and a provider database that are joined to create a joined dataset for scoring.

    Systems and methods for rapid detection of software bugs in SQL-like platforms

    公开(公告)号:US11188528B1

    公开(公告)日:2021-11-30

    申请号:US17241745

    申请日:2021-04-27

    Applicant: Snowflake Inc.

    Inventor: Orestis Kostakis

    Abstract: Disclosed herein are systems and methods for rapid detection of software bugs in data platforms. One embodiment takes the form of a method that includes a comment-analysis system of a data platform receiving query comments associated with a query that was submitted to the data platform. The data platform determines that the query comments include a reference to a software bug of the data platform, and responsively causes one or more software-bug alerts pertaining to the software bug to be transmitted to one or more endpoints.

    Search in a data marketplace
    35.
    发明授权

    公开(公告)号:US12222950B2

    公开(公告)日:2025-02-11

    申请号:US18085452

    申请日:2022-12-20

    Applicant: Snowflake Inc.

    Abstract: A search engine of a data exchange may receive from a user, a query comprising a set of search terms, and retrieve a set of data listings based on the search terms of the query. A data ranking module of the search engine may analyze each of the set of retrieved data listings to determine, for each of the set of retrieved data listings, a set of listing-specific signals and a set of external signals. Listing-specific signals may correspond to attributes or characteristics of data/content within a data listing, while external signals may correspond to a measure of activity in the data exchange that involves a data listing. Based on the listing-specific signals and the external signals analyzed for each retrieved data listing, the set of retrieved data listings may be ordered and presented to the user.

    Multi-party machine learning using a database cleanroom

    公开(公告)号:US12020128B2

    公开(公告)日:2024-06-25

    申请号:US18162695

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06N20/00

    Abstract: A method includes installing, in a consumer database account, a shared-instance database that includes a shared instance of a provider-account database that resides in a provider database account. The shared-instance database includes a first schema that includes provider-account training data, provider-account scoring data, a training function, and a scoring function. The method also includes invoking the training function from the consumer database account, which results in creation in the consumer database account of a second schema that includes a machine-learning-model instance of a machine learning model, and which also results in training the machine-learning model instance with at least the provider-account training data. Additionally, the method includes generating consumer-account scoring data by inputting, into the trained machine-learning-model instance, consumer-account input data that is stored in the consumer database account. The method also includes storing the consumer-account scoring data in the consumer database account.

    SEARCH IN A DATA MARKETPLACE
    38.
    发明公开

    公开(公告)号:US20240202203A1

    公开(公告)日:2024-06-20

    申请号:US18085452

    申请日:2022-12-20

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24578 G06F16/24542 G06F16/24564

    Abstract: A search engine of a data exchange may receive from a user, a query comprising a set of search terms, and retrieve a set of data listings based on the search terms of the query. A data ranking module of the search engine may analyze each of the set of retrieved data listings to determine, for each of the set of retrieved data listings, a set of listing-specific signals and a set of external signals. Listing-specific signals may correspond to attributes or characteristics of data/content within a data listing, while external signals may correspond to a measure of activity in the data exchange that involves a data listing. Based on the listing-specific signals and the external signals analyzed for each retrieved data listing, the set of retrieved data listings may be ordered and presented to the user.

    MULTI-PARTY MACHINE LEARNING USING A DATABASE CLEANROOM

    公开(公告)号:US20230409968A1

    公开(公告)日:2023-12-21

    申请号:US18162695

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06N20/00

    Abstract: A method includes installing, in a consumer database account, a shared-instance database that includes a shared instance of a provider-account database that resides in a provider database account. The shared-instance database includes a first schema that includes provider-account training data, provider-account scoring data, a training function, and a scoring function. The method also includes invoking the training function from the consumer database account, which results in creation in the consumer database account of a second schema that includes a machine-learning-model instance of a machine learning model, and which also results in training the machine-learning model instance with at least the provider-account training data. Additionally, the method includes generating consumer-account scoring data by inputting, into the trained machine-learning-model instance, consumer-account input data that is stored in the consumer database account. The method also includes storing the consumer-account scoring data in the consumer database account.

    OVERLAP RESULTS DATA GENERATION ON A CLOUD DATA PLATFORM

    公开(公告)号:US20230385286A1

    公开(公告)日:2023-11-30

    申请号:US18162688

    申请日:2023-01-31

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24568 G06F16/24564 G06F16/244 G06F16/2456

    Abstract: A system for generating similarity data for different datasets in a cloud data platform. A first dataset of a plurality of datasets on the cloud data platform is identified, where the first dataset is associated with a first user of the cloud data platform. A semantic type for each feature the first dataset is identified, and each semantic type for the first dataset is compared with existing data of the first user. Semantic types for each feature of each dataset are identified, and each semantic type for the first dataset is compared to each semantic type of each dataset. Overlap requests are generated to output overlap datasets between the first dataset and each of the plurality of datasets. A results dataset is generated by applying the overlap requests to a joined dataset comprising data from the first dataset and data from each of the plurality of datasets.

Patent Agency Ranking