Build-side skew handling for hash-partitioning hash joins

    公开(公告)号:US12001428B2

    公开(公告)日:2024-06-04

    申请号:US18047872

    申请日:2022-10-19

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24537 G06F16/2255

    Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.

    BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS IN DISTRIBUTED DATABASE QUERY EXECUTION

    公开(公告)号:US20240419663A1

    公开(公告)日:2024-12-19

    申请号:US18819649

    申请日:2024-08-29

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems, methods, and computer-storage media for managing data skew in hash join operations. A skew manager partitions build-side row data into multiple sets corresponding to hash-join-build (HJB) instances based on hash values. The skew manager detects skew in a build-side row set associated with a first HJB instance by analyzing the number of rows. Upon detecting skew, the skew manager redirects data rows to at least a second HJB instance. The method involves configuring skew caches, generating histograms, and detecting frequent hash values to identify skew. It also includes communicating skew notifications, broadcasting probe-side row data, and adjusting partitioning of probe-side data. The disclosed techniques further include buffering build-side row sets in streams and performing join operations based on these streams, enhancing efficiency in distributed computing environments.

    BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

    公开(公告)号:US20240232189A9

    公开(公告)日:2024-07-11

    申请号:US18047872

    申请日:2022-10-19

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24537 G06F16/2255

    Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.

    UNIFIED STRUCTURED AND SEMI-STRUCTURED DATA TYPES IN DATABASE SYSTEMS

    公开(公告)号:US20240427790A1

    公开(公告)日:2024-12-26

    申请号:US18497746

    申请日:2023-10-30

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a query, the query referencing a unified representation for structured type data and semi-structured type data, the unified representation being provided in storage and in memory during query processing, the unified representation comprising a set of structured type fields that include a set of semi-structured typed fields that enables type safety and enforcement for the set of structured type fields, and flexibility for the set of semi-structured typed fields in a same column, the unified representation in storage including type information for the semi-structured type data as part of the semi-structured type data, the unified representation being utilized for structured type data and semi-structured type data. The subject technology processes the query using the unified representation stored in the memory, the unified representation providing performance parity between structured type data and semi-structured type data.

    BUILD-SIDE SKEW HANDLING FOR JOIN OPERATIONS

    公开(公告)号:US20240273096A1

    公开(公告)日:2024-08-15

    申请号:US18644323

    申请日:2024-04-24

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24537 G06F16/2255

    Abstract: A method includes generating, by at least one hardware processor of a first computing node, a plurality of hash values using build-side row data. A frequent hash value of the plurality of hash values is detected based on row size associated with a plurality of build-side row sets including the build-side row data. A plurality of hash partitions of the build-side row data is generated using a build-side row set of the plurality of build-side row sets that includes the frequent hash value. The plurality of hash partitions of the build-side row data is distributed to a corresponding plurality of hash-join-build (HJB) instances associated with a plurality of join operations.

    BUILD-SIDE SKEW HANDLING FOR HASH-PARTITIONING HASH JOINS

    公开(公告)号:US20240134851A1

    公开(公告)日:2024-04-25

    申请号:US18047872

    申请日:2022-10-18

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24537 G06F16/2255

    Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.

Patent Agency Ranking