-
公开(公告)号:US12001428B2
公开(公告)日:2024-06-04
申请号:US18047872
申请日:2022-10-19
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/24 , G06F16/22 , G06F16/2453
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.
-
2.
公开(公告)号:US20240419663A1
公开(公告)日:2024-12-19
申请号:US18819649
申请日:2024-08-29
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Bowei Chen , Bjoern Daase , Moritz Eyssen , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
Abstract: Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.
-
公开(公告)号:US20240232189A9
公开(公告)日:2024-07-11
申请号:US18047872
申请日:2022-10-19
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.
-
公开(公告)号:US20240427790A1
公开(公告)日:2024-12-26
申请号:US18497746
申请日:2023-10-30
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Bowei Chen , Prateek Gaur , Dmitry A. Lychagin , Muthunagappan Muthuraman , Zhuo Peng , Mengran Wang , Jiaqi Yan
IPC: G06F16/25 , G06F16/835
Abstract: The subject technology receives a query, the query referencing a unified representation for structured type data and semi-structured type data, the unified representation being provided in storage and in memory during query processing, the unified representation comprising a set of structured type fields that include a set of semi-structured typed fields that enables type safety and enforcement for the set of structured type fields, and flexibility for the set of semi-structured typed fields in a same column, the unified representation in storage including type information for the semi-structured type data as part of the semi-structured type data, the unified representation being utilized for structured type data and semi-structured type data. The subject technology processes the query using the unified representation stored in the memory, the unified representation providing performance parity between structured type data and semi-structured type data.
-
公开(公告)号:US20240273096A1
公开(公告)日:2024-08-15
申请号:US18644323
申请日:2024-04-24
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: A method includes generating, by at least one hardware processor of a first computing node, a plurality of hash values using build-side row data. A frequent hash value of the plurality of hash values is detected based on row size associated with a plurality of build-side row sets including the build-side row data. A plurality of hash partitions of the build-side row data is generated using a build-side row set of the plurality of build-side row sets that includes the frequent hash value. The plurality of hash partitions of the build-side row data is distributed to a corresponding plurality of hash-join-build (HJB) instances associated with a plurality of join operations.
-
公开(公告)号:US20240134851A1
公开(公告)日:2024-04-25
申请号:US18047872
申请日:2022-10-18
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.
-
-
-
-
-