-
公开(公告)号:US11687506B1
公开(公告)日:2023-06-27
申请号:US17872463
申请日:2022-07-25
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis , Prasanna V. Krishnan , Subramanian Muralidhar , Shakhina Pulatova , Megan Marie Schoendorf
IPC: G06F16/00 , G06F16/215 , G06F16/176 , G06F16/25 , G06F16/2457
CPC classification number: G06F16/215 , G06F16/176 , G06F16/24578 , G06F16/256
Abstract: Affinity-based listing recommendations are created and used in a public data exchange. Listings can be evaluated against one another for affinity or similarity such that users working with a particular dataset can be presented with other datasets that share an affinity. Affinity can be determined from both the dataset metadata as well as information from the dataset content. Calculation of affinity scores can be pre-computed and stored, in advance of use, or determined on-the-fly. Presentation of most-similar listings can be deterministic, can contain randomization, can employ time-decay, can be weighted, and can make use of a tiered-sum approach.
-
公开(公告)号:US11568320B2
公开(公告)日:2023-01-31
申请号:US17154928
申请日:2021-01-21
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
-
公开(公告)号:US11372679B1
公开(公告)日:2022-06-28
申请号:US17647635
申请日:2022-01-11
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , Abdul Munir , Prayag Chandran Nirmala , Jeffrey Rosen
IPC: G06F9/46 , G06F9/50 , G06F16/2455 , G06N5/04 , G06N20/00
Abstract: The subject technology requests information related to usage history metadata from a metadata database. The subject technology receives the requested information from the metadata database, the requested information comprising information related to user demand. The subject technology predicts a size value indicating an amount of computing resources to request for executing a set of queries based on the usage history metadata. The subject technology determines, during a prefetch window of time within a first period of time, a current size of freepool of computing resources. The subject technology, in response to the current size of the freepool of computing resources being smaller than the predicted size value, sends a request for additional computing resources to include in the freepool of computing resources.
-
公开(公告)号:US11294895B1
公开(公告)日:2022-04-05
申请号:US17533932
申请日:2021-11-23
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis
IPC: G06F16/24 , G06F16/245 , G06F11/36
Abstract: Disclosed herein are systems and methods for generating anonymized software-bug alerts from query comments. In an embodiment, a data platform obtains query comments associated with a query, and determines that the query comments include a reference to a software bug of the data platform. In response to making that determination, the data platform generates an anonymized software-bug alert that includes at least part of the query comments, and transmits the anonymized software-bug alert to an endpoint such as a queue of software-bug tickets.
-
公开(公告)号:US12130811B2
公开(公告)日:2024-10-29
申请号:US18362869
申请日:2023-07-31
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , John Reumann
IPC: G06F16/00 , G06F16/2453 , G06F16/27
CPC classification number: G06F16/24542 , G06F16/27
Abstract: A system for improving task scheduling on a cloud data platform is provided. A task to be executed using resources of a computing cluster is received. A task execution plan is generated and information about data to be used for the ask is accessed. Resource requirements for executing the task are predicted by applying machine learning to the task execution plan and the information about the data. Assignment data is generated to execute the task on the resources by applying machine learning information about a current state of the resources and predicted resource requirements.
-
公开(公告)号:US20240273417A1
公开(公告)日:2024-08-15
申请号:US18643787
申请日:2024-04-23
Applicant: Snowflake Inc.
Inventor: Orestis Kostakis , Justin Langseth
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: Embodiments of the present disclosure may provide a data sharing system implemented as a local application in a consumer database of a distributed database. The local application can include a training function and a scoring function to train a machine learning model on provider and consumer data, and generate output data by applying the trained machine learning model on input data. The input data can include data portions from a consumer database and a provider database that are joined to create a joined dataset for scoring.
-
公开(公告)号:US12050890B2
公开(公告)日:2024-07-30
申请号:US18362114
申请日:2023-07-31
Applicant: Snowflake Inc.
Inventor: Jianzhun Du , Orestis Kostakis , Kristopher Wagner , Yijun Xie
Abstract: The subject technology identifies a set of functions included in a set of files corresponding to a library. The subject technology, for each function in the set of functions, registers the function as a user defined function (UDF). The subject technology generates a name for the function based at least in part on a predetermined prefix, wherein the predetermined prefix comprises an alphanumeric string. The subject technology generates, using at least a particular set of input parameters utilized by the function and a particular type of parameter of each input parameter of the particular set of input parameters, a particular set of source code. The subject technology stores information corresponding to the function in a metadata database. The subject technology provides access to the function in a different application.
-
公开(公告)号:US20240232722A1
公开(公告)日:2024-07-11
申请号:US18582560
申请日:2024-02-20
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Techniques for managing input and output error of a machine learning (ML) model in a database system are presented herein. Test data is generated from successive versions of a database system, the database system comprising a machine learning (ML) model to generate an output corresponding to a function of the database system The test data is used to train an error model to determine an error associated with the output of or an input to the ML model between the successive versions of the database system. In response to the ML model generating a first output based on a first input: the error model adjusts the first output when the error is associated with the output to the ML model and adjusts the first input when the error is associated with the input to the ML model.
-
公开(公告)号:US20240119051A1
公开(公告)日:2024-04-11
申请号:US18545889
申请日:2023-12-19
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis
IPC: G06F16/2453 , G06F16/2455 , G06N20/00
CPC classification number: G06F16/24542 , G06F16/2455 , G06N20/00
Abstract: The subject technology receives a query directed to a set of source tables, each source table organized into a set of micro-partitions. The subject technology determines a set of metadata, the set of metadata comprising table metadata, query metadata, and historical data related to the query. The subject technology predicts, using a machine learning model, an indicator of an amount of computing resources for executing the query based at least in part on the set of metadata. The subject technology generates a query plan for executing the query based at least in part on the predicted indicator of the amount of computing resources. The subject technology executes the query based at least in part on the query plan.
-
公开(公告)号:US20240078235A1
公开(公告)日:2024-03-07
申请号:US18362869
申请日:2023-07-31
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , John Reumann
IPC: G06F16/2453 , G06F16/27
CPC classification number: G06F16/24542 , G06F16/27
Abstract: A system for improving task scheduling on a cloud data platform is provided. A task to be executed using resources of a computing cluster is received. A task execution plan is generated and information about data to be used for the ask is accessed. Resource requirements for executing the task are predicted by applying machine learning to the task execution plan and the information about the data. Assignment data is generated to execute the task on the resources by applying machine learning information about a current state of the resources and predicted resource requirements.
-
-
-
-
-
-
-
-
-