BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION

    公开(公告)号:US20240419663A1

    公开(公告)日:2024-12-19

    申请号:US18819649

    申请日:2024-08-29

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.

Patent Agency Ranking