MULTI-CLUSTER WAREHOUSE
    31.
    发明申请

    公开(公告)号:US20210089559A1

    公开(公告)日:2021-03-25

    申请号:US17116625

    申请日:2020-12-09

    Applicant: SNOWFLAKE INC.

    Abstract: A method for a multi-cluster warehouse includes allocating a plurality of compute clusters as part of a virtual warehouse. The compute clusters are used to access and perform queries against one or more databases in one or more cloud storage resources. The method includes providing queries for the virtual warehouse to each of the plurality of compute clusters. Each of the plurality of compute clusters of the virtual warehouse receives a plurality of queries so that the computing load is spread across the different clusters. The method also includes dynamically adding compute clusters to and removing compute clusters from the virtual warehouse as needed based on a workload of the plurality of compute clusters.

    PLACEMENT OF ADAPTIVE AGGREGATION OPERATORS AND PROPERTIES IN A QUERY PLAN

    公开(公告)号:US20210089533A1

    公开(公告)日:2021-03-25

    申请号:US16857790

    申请日:2020-04-24

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and at least one join operation. The subject technology analyzes the query plan to identify an aggregation that is redundant. The subject technology removes the aggregation based at least in part on the analyzing. The subject technology determines at least one aggregation property corresponding to at least one query operation of the query plan. The subject technology inserts at least one adaptive aggregation operator in the query plan based at least in part on the at least one aggregation property. The subject technology provides a modified query plan based at least in part on the inserted at least one adaptive aggregation operator in the query plan.

    BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION

    公开(公告)号:US20240419663A1

    公开(公告)日:2024-12-19

    申请号:US18819649

    申请日:2024-08-29

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.

    BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

    公开(公告)号:US20240232189A9

    公开(公告)日:2024-07-11

    申请号:US18047872

    申请日:2022-10-19

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24537 G06F16/2255

    Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.

    EFFICIENT DATABASE QUERY EVALUATION
    36.
    发明公开

    公开(公告)号:US20240220456A1

    公开(公告)日:2024-07-04

    申请号:US18607857

    申请日:2024-03-18

    Applicant: Snowflake Inc

    CPC classification number: G06F16/1744 G06F16/221 G06F16/27

    Abstract: Data in a micro-partition of a table is stored in a compressed form. In response to a database query on the table comprising a filter, the portion of the data on which the filter operates is decompressed, without decompressing other portions of the data. Using the filter on the decompressed portion of the data, the portions of the data that are responsive to the filter are determined and decompressed. The responsive data is returned in response to the database query. When a query is run on a table that is compressed using dictionary compression, the uncompressed data may be returned along with the dictionary look-up values. The recipient of the data may use the dictionary look-up values for memoization, reducing the amount of computation required to process the returned data.

    SYSTEMS AND METHODS FOR SPILLING DATA FOR HASH JOINS

    公开(公告)号:US20230334050A1

    公开(公告)日:2023-10-19

    申请号:US18073464

    申请日:2022-12-01

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/2456 G06F16/24554

    Abstract: A method includes determining that an amount of available space in a first memory used by a set of relational queries is insufficient for a query, wherein the query comprises a join operation. A first partition of a set of partitions is identified, wherein the first partition possesses a smallest available probe memory size of the set of partitions and a build memory size greater than or equal to a threshold memory size, wherein the threshold memory size is a percentage of a maximum build memory size, and the largest partition of the set of partitions has the maximum build memory size. The first partition is copied from the first memory to a second memory.

    Multi-cluster warehouse
    39.
    发明授权

    公开(公告)号:US11615117B2

    公开(公告)日:2023-03-28

    申请号:US15582071

    申请日:2017-04-28

    Applicant: Snowflake Inc.

    Abstract: A method for a multi-cluster warehouse includes allocating a plurality of compute clusters as part of a virtual warehouse. The compute clusters are used to access and perform queries against one or more databases in one or more cloud storage resources. The method includes providing queries for the virtual warehouse to each of the plurality of compute clusters. Each of the plurality of compute clusters of the virtual warehouse receives a plurality of queries so that the computing load is spread across the different clusters. The method also includes dynamically adding compute clusters to and removing compute clusters from the virtual warehouse as needed based on a workload of the plurality of compute clusters.

    Multi-cluster warehouse
    40.
    发明授权

    公开(公告)号:US11593403B2

    公开(公告)日:2023-02-28

    申请号:US16823124

    申请日:2020-03-18

    Applicant: Snowflake Inc.

    Abstract: A method for a multi-cluster warehouse includes allocating processing units as part of a data warehouse. The processing units access data within one or more databases in cloud storage resources. The method also includes providing one or more queries to each processing unit within the data warehouse. In response to the queries, each processing unit performs database operations on a particular portion of a database table. The method also includes monitoring a workload of the processing units to determine that a processing capacity of the processing units has reached a threshold processing capacity. The method also includes dynamically adding additional processing units to and removing processing units from the data warehouse as needed based on a workload of the processing units.

Patent Agency Ranking