Stage replication in a cloud data lake

    公开(公告)号:US11620307B2

    公开(公告)日:2023-04-04

    申请号:US17396576

    申请日:2021-08-06

    Applicant: Snowflake Inc.

    Abstract: Described herein are techniques for replicating external stages between deployments of e.g., a cloud data lake using a modified storage integration. The modified storage integration may be defined with multiple storage locations that it can point to, as well as a designation of an active storage location. The storage integration may also be defined with base file paths for each storage location as well as a relative file path which together may serve to synchronize data loading operations between deployments when e.g., a fail-over occurs from one deployment to another. The storage integration may be replicated from a first deployment to a second deployment, and when database replication occurs, an external stage may be replicated to the second deployment and bound to the replicated storage integration. Thus, a fail-over to the second deployment may result in a seamless transition of data loading processes to the second deployment.

    Framework for providing intermediate aggregation operators in a query plan

    公开(公告)号:US11620287B2

    公开(公告)日:2023-04-04

    申请号:US16939750

    申请日:2020-07-27

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation. The subject technology analyzes the at least one aggregation to generate a modified query plan, the modified query plan including at least a top aggregation operator, an intermediate aggregation operator, and a bottom aggregation operator. The subject technology performs, with respect to the intermediate aggregation operator, at least one operation comprising: the subject technology receives an input intermediate data type; the subject technology performs an internalize operation on the input intermediate data type to generate an internal state; the subject technology performs an accumulate operation on the internal state to generate intermediate data; and the subject technology performs an externalize operation on the intermediate data to generate an output data type.

    Registration of multiple user defined functions

    公开(公告)号:US11620110B1

    公开(公告)日:2023-04-04

    申请号:US17834668

    申请日:2022-06-07

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a set of files corresponding to a library, the library comprising a set of functions included in the set of files. The subject technology parses the set of files. The subject technology identifies a set of functions in the set of files based on the parsing. The subject technology, for each function, registers the function as a user defined function (UDF) based on a set of input parameters utilized by the function and a type of parameter of each of the input parameters. The subject technology provides access to each registered function in a different application.

    Multi-cluster warehouse
    145.
    发明授权

    公开(公告)号:US11615118B2

    公开(公告)日:2023-03-28

    申请号:US16862140

    申请日:2020-04-29

    Applicant: Snowflake Inc.

    Abstract: A method for a multi-cluster warehouse includes allocating processing units as part of a data warehouse. The processing units access data within one or more databases in cloud storage resources. The method also includes providing one or more queries to each processing unit within the data warehouse. In response to the queries, each processing unit performs database operations on a particular portion of a database table. The method also includes monitoring a workload of the processing units to determine that a processing capacity of the processing units has reached a threshold processing capacity. The method also includes dynamically adding additional processing units to and removing processing units from the data warehouse as needed based on a workload of the processing units.

    Automatic pruning cutoff in a database system

    公开(公告)号:US11615095B2

    公开(公告)日:2023-03-28

    申请号:US17162979

    申请日:2021-01-29

    Applicant: Snowflake Inc.

    Abstract: During a query compilation process, a query is received that is directed to a set of source tables, each source table from the set of source tables being organized into at least one micro-partition and the query including at least one pruning operation. During the query compilation process, a modification of the query is performed for adjusting the at least one pruning operation, the modification being based on a set of statistics collected for previous pruning operations on at least a portion of the set of source tables and a set of heuristics, the set of statistics indicating at least an amount of execution time for each previous query associated with each of the previous pruning operations. The query is compiled including the modification of the query. The compiled query is provided to an execution node of a database system for execution.

    ACCESSING EXTERNAL RESOURCES USING REMOTELY STORED CREDENTIALS

    公开(公告)号:US20230076680A1

    公开(公告)日:2023-03-09

    申请号:US18050909

    申请日:2022-10-28

    Applicant: Snowflake Inc.

    Abstract: A credentials store definition identifying a remote credential store is received. The credential store definition includes access information to enable access to the remote credentials store. A credentials object is created in an internal database based on a credentials object definition. The credentials object identifies a security credential to retrieve from the remote credentials store to access an external resource. At runtime, a request to access the external resource is received, and based on receiving the request, the security credentials identified by the credentials object are retrieved from the remote credential store using the access information. The retrieved security credential is provided to a processing component to access the external resource.

    Multi-cluster warehouse
    148.
    发明授权

    公开(公告)号:US11593404B2

    公开(公告)日:2023-02-28

    申请号:US16863758

    申请日:2020-04-30

    Applicant: Snowflake Inc.

    Abstract: A method for a multi-cluster warehouse includes allocating processing units as part of a data warehouse. The processing units access data within one or more databases in cloud storage resources. The method also includes providing one or more queries to each processing unit within the data warehouse. In response to the queries, each processing unit performs database operations on a particular portion of a database table. The method also includes monitoring a workload of the processing units to determine that a processing capacity of the processing units has reached a threshold processing capacity. The method also includes dynamically adding additional processing units to and removing processing units from the data warehouse as needed based on a workload of the processing units.

Patent Agency Ranking