-
公开(公告)号:US11010378B1
公开(公告)日:2021-05-18
申请号:US16818485
申请日:2020-03-13
Applicant: Snowflake Inc.
Inventor: Thierry Cruanes , Florian Andreas Funke , Guangyan Hu , Jiaqi Yan
IPC: G06F7/00 , G06F16/00 , G06F16/2453 , G06F16/2455 , G06F16/22
Abstract: Joining data using a disjunctive operator using a lookup table is described. An example computer-implemented method can include receiving a query with a set of conjunctive predicates and a set of disjunctive predicates. The method may also include generating a lookup table for each predicate in the sets of conjunctive predicates and disjunctive predicates. The method, for each row in a probe-side table, may also further include looking up a value associated with that row in each of the lookup tables and adding the row to a results set when there is a match. Additionally, the method may also include returning the results set.
-
公开(公告)号:US20210089535A1
公开(公告)日:2021-03-25
申请号:US16857817
申请日:2020-04-24
Applicant: Snowflake Inc.
Inventor: Bowei Chen , Thierry Cruanes , Florian Andreas Funke , Allison Waingold Lee , Jiaqi Yan
IPC: G06F16/2453 , G06F16/2455 , G06F16/22
Abstract: The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side. The subject technology inserts an aggregation operator below the probe side of the join operation. The subject technology causes the build side of the join operation to generate a hash table. The subject technology causes the build side of the join operation to generate a bloom filter based at least in part on the hash table and provide information, corresponding to properties of the build side, to a bloom filter. Based at least in part on the information, the subject technology determines at least one property of the join operation to determine whether to switch the aggregation operator to a pass through mode.
-
公开(公告)号:US20240273096A1
公开(公告)日:2024-08-15
申请号:US18644323
申请日:2024-04-24
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: A method includes generating, by at least one hardware processor of a first computing node, a plurality of hash values using build-side row data. A frequent hash value of the plurality of hash values is detected based on row size associated with a plurality of build-side row sets including the build-side row data. A plurality of hash partitions of the build-side row data is generated using a build-side row set of the plurality of build-side row sets that includes the frequent hash value. The plurality of hash partitions of the build-side row data is distributed to a corresponding plurality of hash-join-build (HJB) instances associated with a plurality of join operations.
-
公开(公告)号:US20240232226A1
公开(公告)日:2024-07-11
申请号:US18617083
申请日:2024-03-26
Applicant: Snowflake Inc.
Inventor: Thierry Cruanes , Benoit Dageville , Florian Andreas Funke , Peter Povinec
IPC: G06F16/28 , G06F9/50 , G06F16/2455 , H04L41/0896 , H04L41/5025 , H04L43/0817 , H04L67/1008 , H04L67/1097
CPC classification number: G06F16/283 , G06F9/5072 , G06F16/2455 , H04L41/0896 , H04L41/5025 , H04L67/1008 , H04L67/1097 , H04L43/0817
Abstract: A method implementing a fault-tolerant data warehouse including allocating a plurality of processing units to a data warehouse, the processing units located in different availability zones, an availability zone comprising one or more data centers. The method further includes, as a result of monitoring a number of queries running at an input degree of parallelism on the plurality of processing units of the data warehouse, determining that the number of queries is serviceable by one fewer processing units. The method further includes routing a query from a first processing unit to a second processing unit within the data warehouse, the query having a common session identifier with another query previously provided to the second processing unit, the second processing unit determined to be caching a data segment associated with a cloud storage resource, usable by the query, and removing the first processing unit from the data warehouse.
-
公开(公告)号:US20240134851A1
公开(公告)日:2024-04-25
申请号:US18047872
申请日:2022-10-18
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.
-
公开(公告)号:US20230259530A1
公开(公告)日:2023-08-17
申请号:US18139809
申请日:2023-04-26
Applicant: Snowflake Inc.
Inventor: Thierry Cruanes , Benoit Dageville , Florian Andreas Funke , Peter Povinec
IPC: G06F16/28 , H04L67/1008 , H04L41/5025 , G06F16/2455 , G06F9/50 , H04L67/1097 , H04L41/0896
CPC classification number: G06F16/283 , H04L67/1008 , H04L41/5025 , G06F16/2455 , G06F9/5072 , H04L67/1097 , H04L41/0896 , H04L43/0817
Abstract: A method implementing a fault-tolerant data warehouse using availability zones includes allocating a plurality of processing units to a data warehouse, the processing units located in different availability zones, an availability zone comprising one or more data centers. The method further includes routing a query to a processing unit within the data warehouse, the query having a common session identifier with a query previously provided to the processing unit, the processing unit determined to be caching a data segment associated with a cloud storage resource independent of the plurality of processing units. The method further includes, as a result of monitoring a number of queries running at an input degree of parallelism, determining that the processing capacity of the processing units has reached a threshold; and changing a total number of processing units using the input degree of parallelism and the number of queries.
-
公开(公告)号:US20230161765A1
公开(公告)日:2023-05-25
申请号:US18099866
申请日:2023-01-20
Applicant: Snowflake Inc.
Inventor: Thierry Cruanes , Florian Andreas Funke , Guangyan Hu , Jiaqi Yan
IPC: G06F16/2453 , G06F16/2455 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/24556 , G06F16/2255
Abstract: Joining data using a disjunctive operator using a lookup table is described. An example computer-implemented method can include receiving a query with a set of conjunctive predicates and a set of disjunctive predicates. The method may also include generating a lookup table for each predicate in the sets of conjunctive predicates and disjunctive predicates. The method, for each row in a probe-side table, may also further include looking up a value associated with that row in each of the lookup tables and adding the row to a results set when there is a match. Additionally, the method may also include returning the results set.
-
公开(公告)号:US11550793B1
公开(公告)日:2023-01-10
申请号:US17721599
申请日:2022-04-15
Applicant: Snowflake Inc.
Inventor: Florian Andreas Funke , Megha Thakkar
IPC: G06F16/24 , G06F16/2455
Abstract: Systems and methods for spilling data for hash joins are described. An example method includes determining an amount of available space in a first memory used by a set of relational queries is insufficient for a first relational join query. The first relational join query comprises a join operation. The method also includes determining a set of build memory sizes and a set of probe memory sizes for a set of partitions for the set of relational queries. The method further includes identifying a first partition of the set of partitions based on the set of probe memory sizes and the set of build memory sizes. The method further includes copying the first partition from the first memory to a second memory, wherein the first partition comprises a first build portion and a first probe portion.
-
公开(公告)号:US11468063B2
公开(公告)日:2022-10-11
申请号:US17232821
申请日:2021-04-16
Applicant: Snowflake Inc.
Inventor: Bowei Chen , Thierry Cruanes , Florian Andreas Funke , Allison Waingold Lee , Jiaqi Yan
IPC: G06F16/2453 , G06F16/2455 , G06F16/22
Abstract: The subject technology provides information, corresponding to properties of a build side of a join operation, to a bloom filter. The subject technology, based at least in part on the information from the bloom filter, determines, during executing of a query plan, at least one property of the join operation to determine whether to switch an aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate. The subject technology, switches, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed and the input stream of data matches an output stream of data flowing out of the aggregation operator.
-
公开(公告)号:US20210286817A1
公开(公告)日:2021-09-16
申请号:US17235826
申请日:2021-04-20
Applicant: Snowflake Inc.
Inventor: Thierry Cruanes , Florian Andreas Funke , Guangyan Hu , Jiaqi Yan
IPC: G06F16/2453 , G06F16/22 , G06F16/2455
Abstract: Joining data using a disjunctive operator using a lookup table is described. An example computer-implemented method can include receiving a query with a set of conjunctive predicates and a set of disjunctive predicates. The method may also include generating a lookup table for each predicate in the sets of conjunctive predicates and disjunctive predicates. The method, for each row in a probe-side table, may also further include looking up a value associated with that row in each of the lookup tables and adding the row to a results set when there is a match. Additionally, the method may also include returning the results set.
-
-
-
-
-
-
-
-
-