-
1.
公开(公告)号:US20240419663A1
公开(公告)日:2024-12-19
申请号:US18819649
申请日:2024-08-29
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Bowei Chen , Bjoern Daase , Moritz Eyssen , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
Abstract: Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.