-
公开(公告)号:US20250021558A1
公开(公告)日:2025-01-16
申请号:US18902195
申请日:2024-09-30
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , John Reumann
IPC: G06F16/2453 , G06F16/27
Abstract: A method for improving query scheduling on a computing cluster using reinforcement learning is provided. A series of queries to be executed using resources of the computing cluster is received. For each query, a query execution plan is generated and a resource profile for executing the query is predicted. Current state data of the cluster resources is received and assignment data to execute the query on the cluster resources is generated by applying the reinforcement learning technique. The query is executed on the computing cluster based on the generated assignment data, and query results are stored.
-
公开(公告)号:US11947533B2
公开(公告)日:2024-04-02
申请号:US18318293
申请日:2023-05-16
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis
IPC: G06F16/24 , G06F16/245 , G06F11/36
CPC classification number: G06F16/245 , G06F11/362
Abstract: A method includes parsing, by at least one hardware processor, a query to determine query comments and query code associated with the query. A query execution plan is generated based on the query code. Query execution using the query code is performed at a first computing node associated with a query processing pipeline. A detection is made that the query comments are indicative of a software bug in the query code based on analysis of the query comments. The detection is performed at a second computing node associated with a query analysis pipeline. A notification of the software bug and a result of the query execution is output.
-
公开(公告)号:US20240062098A1
公开(公告)日:2024-02-22
申请号:US17821587
申请日:2022-08-23
Applicant: Snowflake Inc.
Inventor: Rachel Frances Blum , Nancy Dou , Matthew J. Glickman , Boxin Jiang , Orestis Kostakis , Justin Langseth , Michael Earle Rainey , Haoran Yu
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: The subject technology receives first party training data provided by an end-user of a baseline machine learning model. The subject technology determines a first set of common features based on the first party training data. The subject technology receives, from at least one data source. The subject technology determines a second set of common features based on the set of datasets. The subject technology trains, using the first set of common features and the second set of common features, a second machine learning model, the second machine learning model incorporating additional training data from the external data supplier during training compared to the baseline machine learning model. The subject technology generates a boosted machine learning model based at least in part on the training, the boosted machine learning model comprising the trained second machine learning model.
-
公开(公告)号:US11687506B1
公开(公告)日:2023-06-27
申请号:US17872463
申请日:2022-07-25
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis , Prasanna V. Krishnan , Subramanian Muralidhar , Shakhina Pulatova , Megan Marie Schoendorf
IPC: G06F16/00 , G06F16/215 , G06F16/176 , G06F16/25 , G06F16/2457
CPC classification number: G06F16/215 , G06F16/176 , G06F16/24578 , G06F16/256
Abstract: Affinity-based listing recommendations are created and used in a public data exchange. Listings can be evaluated against one another for affinity or similarity such that users working with a particular dataset can be presented with other datasets that share an affinity. Affinity can be determined from both the dataset metadata as well as information from the dataset content. Calculation of affinity scores can be pre-computed and stored, in advance of use, or determined on-the-fly. Presentation of most-similar listings can be deterministic, can contain randomization, can employ time-decay, can be weighted, and can make use of a tiered-sum approach.
-
公开(公告)号:US11568320B2
公开(公告)日:2023-01-31
申请号:US17154928
申请日:2021-01-21
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
-
公开(公告)号:US11372679B1
公开(公告)日:2022-06-28
申请号:US17647635
申请日:2022-01-11
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , Abdul Munir , Prayag Chandran Nirmala , Jeffrey Rosen
IPC: G06F9/46 , G06F9/50 , G06F16/2455 , G06N5/04 , G06N20/00
Abstract: The subject technology requests information related to usage history metadata from a metadata database. The subject technology receives the requested information from the metadata database, the requested information comprising information related to user demand. The subject technology predicts a size value indicating an amount of computing resources to request for executing a set of queries based on the usage history metadata. The subject technology determines, during a prefetch window of time within a first period of time, a current size of freepool of computing resources. The subject technology, in response to the current size of the freepool of computing resources being smaller than the predicted size value, sends a request for additional computing resources to include in the freepool of computing resources.
-
公开(公告)号:US11294895B1
公开(公告)日:2022-04-05
申请号:US17533932
申请日:2021-11-23
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis
IPC: G06F16/24 , G06F16/245 , G06F11/36
Abstract: Disclosed herein are systems and methods for generating anonymized software-bug alerts from query comments. In an embodiment, a data platform obtains query comments associated with a query, and determines that the query comments include a reference to a software bug of the data platform. In response to making that determination, the data platform generates an anonymized software-bug alert that includes at least part of the query comments, and transmits the anonymized software-bug alert to an endpoint such as a queue of software-bug tickets.
-
公开(公告)号:US20250156429A1
公开(公告)日:2025-05-15
申请号:US19021001
申请日:2025-01-14
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis , Timur Misirpashaev
IPC: G06F16/2457 , G06F16/2453 , G06F16/2455
Abstract: A search engine of a data exchange may receive from a user, a query comprising a set of search terms, and retrieve a set of data listings based on the search terms of the query. A data ranking module of the search engine may analyze each of the set of retrieved data listings to determine, for each of the set of retrieved data listings, a set of listing-specific signals and a set of external signals. Listing-specific signals may correspond to attributes or characteristics of data/content within a data listing, while external signals may correspond to a measure of activity in the data exchange that involves a data listing. Based on the listing-specific signals and the external signals analyzed for each retrieved data listing, the set of retrieved data listings may be ordered and presented to the user.
-
公开(公告)号:US12242550B1
公开(公告)日:2025-03-04
申请号:US18238986
申请日:2023-08-28
Applicant: Snowflake Inc.
Inventor: Shuodong Dang , Orestis Kostakis
IPC: G06F16/9532 , G06F9/445 , G06F16/9538
Abstract: A data access event may be recognized, using a browser plug-in, wherein the data access event constitutes a reference to previously obtained data. As a result of recognizing the event, the plug-in may send, to a search engine of a data exchange, a set of extracted terms. The plug-in may receive a set of related data listings related to the set of extracted terms. Upon a selection of a data listing from the set of related data listings, the plug-in may install the data listing to an account.
-
公开(公告)号:US12008001B2
公开(公告)日:2024-06-11
申请号:US17804434
申请日:2022-05-27
Applicant: Snowflake Inc.
Inventor: Matthew J. Glickman , Orestis Kostakis , Justin Langseth
IPC: G06F16/245 , G06F16/24 , G06F16/242 , G06F16/2455
CPC classification number: G06F16/24568 , G06F16/244 , G06F16/2456 , G06F16/24564
Abstract: Systems, methods, and machine-readable storage devices provide for identifying a user dataset on a distributed database. The system includes generating a similarity score dataset that indicates a similarity between the user dataset and a plurality of datasets of other users of the distributed database. The system generates a plurality of overlap queries that are configured to output overlap datasets between the user dataset and one or more of the plurality of datasets. The system further generates a results dataset by applying one or more of the plurality of overlap queries to a joined dataset comprising data from the user dataset and one of the plurality of datasets of other users on the distributed database.
-
-
-
-
-
-
-
-
-