-
公开(公告)号:US20220237192A1
公开(公告)日:2022-07-28
申请号:US17157233
申请日:2021-01-25
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis
IPC: G06F16/2453 , G06N20/00 , G06F16/2455
Abstract: The subject technology receives a query directed to a set of source tables, each source table organized into a set of micro-partitions. The subject technology determines a set of metadata, the set of metadata comprising table metadata, query metadata, and historical data related to the query. The subject technology predicts, using a machine learning model, an indicator of an amount of computing resources for executing the query based at least in part on the set of metadata. The subject technology generates a query plan for executing the query based at least in part on the predicted indicator of the amount of computing resources. The subject technology executes the query based at least in part on the query plan.
-
公开(公告)号:US11243811B1
公开(公告)日:2022-02-08
申请号:US17390265
申请日:2021-07-30
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , Abdul Munir , Prayag Chandran Nirmala , Jeffrey Rosen
IPC: G06F9/46 , G06F9/50 , G06F16/2455 , G06N5/04 , G06N20/00
Abstract: The subject technology requests information related to usage history metadata from a metadata database. The subject technology receives the requested information from the metadata database, the requested information comprising information related to user demand. The subject technology predicts a size value indicating an amount of computing resources to request for executing a set of queries based on the usage history metadata. The subject technology determines, during a prefetch window of time within a first period of time, a current size of freepool of computing resources. The subject technology, in response to the current size of the freepool of computing resources being smaller than the predicted size value, sends a request for additional computing resources to include in the freepool of computing resources.
-
公开(公告)号:US20250021558A1
公开(公告)日:2025-01-16
申请号:US18902195
申请日:2024-09-30
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , John Reumann
IPC: G06F16/2453 , G06F16/27
Abstract: A method for improving query scheduling on a computing cluster using reinforcement learning is provided. A series of queries to be executed using resources of the computing cluster is received. For each query, a query execution plan is generated and a resource profile for executing the query is predicted. Current state data of the cluster resources is received and assignment data to execute the query on the cluster resources is generated by applying the reinforcement learning technique. The query is executed on the computing cluster based on the generated assignment data, and query results are stored.
-
公开(公告)号:US12026221B2
公开(公告)日:2024-07-02
申请号:US18112944
申请日:2023-02-22
Applicant: Snowflake Inc.
Inventor: Michel Adar , Boxin Jiang , Qiming Jiang , John Reumann , Boyu Wang , Jiaxun Wu
IPC: G06F17/18
CPC classification number: G06F17/18
Abstract: Using an attributes model of a time series forecasting model, determine a set of features based on time series data, the set of features including periodic components. The time series data may be divided into a set of segments. For each segment of the set of segments, a weight may be assigned using an age of the segment, resulting in a set of weighted segments of time series data. Using a trend detection model of the time series forecasting model, trend data from the set of weighted segments of time series data may be determined. A time series forecast may be generated by combining the set of features and the trend data.
-
公开(公告)号:US11868326B2
公开(公告)日:2024-01-09
申请号:US18074830
申请日:2022-12-05
Applicant: SNOWFLAKE INC.
Inventor: Boxin Jiang , Qiming Jiang
IPC: G06F16/00 , G06F16/21 , G06F16/2458 , G06F16/242 , G06N3/08
CPC classification number: G06F16/217 , G06F16/2433 , G06F16/2474 , G06N3/08
Abstract: An example method of tuning a machine learning operation can include receiving a data query comprising a reference to an input data set of a database, generating a plurality of unique sets of hyperparameters by varying a hyperparameter value of each set of hyperparameters of the plurality of unique sets of hyperparameters based on the input data set, in response to receiving the data query, training a plurality of machine learning models using the input data set of the data query, each of the plurality of machine learning models configured according to a respective one of a plurality of unique sets of hyperparameters, selecting a first machine learning model of the plurality of machine learning models based on an accuracy of an output of the first machine learning model, and returning the output of the first machine learning model in response to the data query.
-
公开(公告)号:US11568320B2
公开(公告)日:2023-01-31
申请号:US17154928
申请日:2021-01-21
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
-
公开(公告)号:US11372679B1
公开(公告)日:2022-06-28
申请号:US17647635
申请日:2022-01-11
Applicant: Snowflake Inc.
Inventor: Qiming Jiang , Orestis Kostakis , Abdul Munir , Prayag Chandran Nirmala , Jeffrey Rosen
IPC: G06F9/46 , G06F9/50 , G06F16/2455 , G06N5/04 , G06N20/00
Abstract: The subject technology requests information related to usage history metadata from a metadata database. The subject technology receives the requested information from the metadata database, the requested information comprising information related to user demand. The subject technology predicts a size value indicating an amount of computing resources to request for executing a set of queries based on the usage history metadata. The subject technology determines, during a prefetch window of time within a first period of time, a current size of freepool of computing resources. The subject technology, in response to the current size of the freepool of computing resources being smaller than the predicted size value, sends a request for additional computing resources to include in the freepool of computing resources.
-
公开(公告)号:US12287898B2
公开(公告)日:2025-04-29
申请号:US18155293
申请日:2023-01-17
Applicant: SNOWFLAKE INC.
Inventor: Boxin Jiang , Qiming Jiang
IPC: G06F16/2455 , G06F21/62
Abstract: Embodiments of the present disclosure describe systems, methods, and computer program products for redacting sensitive data within a database. An example method can include receiving a data query referencing unredacted data of a database, wherein the data query that is received comprises a value identifying a type of sensitive data to be redacted from the unredacted data, responsive to the data query, executing, by a processing device, a redaction operation to identify sensitive data that matches the type within the unredacted data of the database, and returning a redacted data set in which the sensitive data that matches the type is replaced or removed to the data query.
-
公开(公告)号:US20240362196A1
公开(公告)日:2024-10-31
申请号:US18490586
申请日:2023-10-19
Applicant: Snowflake Inc.
Inventor: Sandeep Narendra Gupta , Qiming Jiang
IPC: G06F16/22 , G06F16/242 , G06F16/2455
CPC classification number: G06F16/2282 , G06F16/2448 , G06F16/24568
Abstract: Provided herein are systems and methods for real-time feature store configuration. The method includes decoding raw data received from a data source to obtain decoded raw data. The decoded raw data includes streaming data and batch data. An incremental computation of features associated with the decoded raw data is performed using at least one dynamic table object. The features are pushed to a feature store using at least one triggered task. Optionally, training of a machine learning model is performed using the features in the feature store.
-
公开(公告)号:US11934927B2
公开(公告)日:2024-03-19
申请号:US18087518
申请日:2022-12-22
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
-
-
-
-
-
-
-
-
-