IMPLEMENTING MULTIDIMENSIONAL TWO-SIDED INTERVAL JOINS ON DATA PLATFORMS

    公开(公告)号:US20230085410A1

    公开(公告)日:2023-03-16

    申请号:US18050130

    申请日:2022-10-27

    Applicant: Snowflake Inc.

    Abstract: In an embodiment, a data platform receives a query that includes a two-sided N dimensional interval join of first and second input relations, where N>1. The two-sided N dimensional interval join has an interval-join predicate that, in each of N dimensions, compares intervals determined from the first and second input relations. The data platform implements the interval join at least in part by identifying an intermediate relation that includes all combinations of a row from the first input relation and a row from the second input relation where, in each of the N dimensions, the intervals determined from the first and second input relations both overlap a common N dimensional domain region of an input domain of the first and second input relations. The data platform obtains and returns results of the query.

    Multidimensional two-sided interval joins on hash-equality-join infrastructure

    公开(公告)号:US11494385B2

    公开(公告)日:2022-11-08

    申请号:US17454894

    申请日:2021-11-15

    Applicant: Snowflake Inc.

    Abstract: In an embodiment, a data platform implements a two-sided N dimensional interval join using an N dimensional band join followed by a filter that applies a predicate of the interval join. The data platform generates first and second modified relations from first and second input relations. Each modified relation includes a copy of each row from the corresponding input relation for each input-domain cell that overlaps, in each of N dimensions, a bounding polygon of intervals determined from the row of the corresponding input relation. The data platform inserts, in each row in each modified relation, an input-domain-cell identifier of the corresponding overlapping input-domain cell and uses a hash-equality join that receives the first and second modified relations and that is keyed on the input-domain-cell identifiers. The data platform obtains results of a query by executing a query-execution plan that includes the query-plan section.

    IMPLEMENTING MULTIDIMENSIONAL TWO-SIDED INTERVAL JOINS USING SAMPLING-BASED INPUT-DOMAIN DEMARCATION

    公开(公告)号:US20220300512A1

    公开(公告)日:2022-09-22

    申请号:US17454899

    申请日:2021-11-15

    Applicant: Snowflake Inc.

    Abstract: In an embodiment, a data platform receives a query that includes a two-sided N dimensional interval join of first and second input relations. The data platform samples, with respect to each of one or more of the N dimensions, one or both of the first input relation and the second input relation with respect to an interval size of an interval determined from the input relation. The data platform demarcates the N dimensional input domain into non-overlapping N dimensional input-domain cells based on the sampling. The data platform implements the interval join using a query-execution plan that includes an equality join that is keyed on input-domain-cell identifiers of input-domain cells that at least partially overlap bounding polygons of the intervals determined from the first and second input relations. The equality join is followed in the query-execution plan by a filter that applies the interval join predicate. The data platform obtains results of the query by executing the query-execution plan.

    INDEXED GEOSPATIAL PREDICATE SEARCH

    公开(公告)号:US20220284025A1

    公开(公告)日:2022-09-08

    申请号:US17804248

    申请日:2022-05-26

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems and methods for indexed geospatial predicate search. An example method performed by at least one hardware processor includes decoding a query with a geospatial predicate. The geospatial predicate is configured between a geography data column and a constant geography object. The method further includes computing a first covering for a data value of a plurality of data values in the geography data column. The first covering includes a first set of cells in a hierarchical grid representation of a geography. The first set of cells represents a surface of the geography associated with the data value. A second covering is computed for the constant geography object. A determination is made on whether to prune at least one partition of a database organized into a set of partitions and including the geography data column based on a comparison between the first covering and the second covering.

    Performing geospatial-function join using implied interval join

    公开(公告)号:US11709837B2

    公开(公告)日:2023-07-25

    申请号:US17334339

    申请日:2021-05-28

    Applicant: Snowflake Inc.

    CPC classification number: G06F16/24544 G06F16/24537

    Abstract: Disclosed herein are systems and methods for performing a geospatial-function join using an implied interval join. In an embodiment, a database platform receives a query that includes a geospatial-function join, which applies a geospatial-function predicate to a first geography data object of a first relation and a second geography data object of a second relation. The database platform processes the first and second relations through an interval join that applies an interval-join predicate that is implied by the geospatial-function predicate. The database platform obtains query results at least in part by implementing a filter that applies the geospatial-function predicate to an output of the interval join, and outputs the query results.

    PRE-FILTER DEDUPLICATION FOR MULTIDIMENSIONAL TWO-SIDED INTERVAL JOINS

    公开(公告)号:US20220300511A1

    公开(公告)日:2022-09-22

    申请号:US17239529

    申请日:2021-04-23

    Applicant: Snowflake Inc.

    Abstract: Disclosed herein are systems and methods for pre-filter deduplication for multidimensional two-sided interval joins. In an embodiment, a data platform receives query instructions for a two-sided N dimensional interval join, where N is an integer greater than 1. The two-sided N dimensional interval join has an interval-join predicate that compares intervals determined from the input relations in each of N dimensions. The data platform implements the two-sided N dimensional interval join as a query-plan section that includes an N dimensional band join that is followed by a deduplication operator that is followed by a filter that applies the interval-join predicate. The N dimensional band join includes a hash join keyed to N dimensional domain cells overlapped at least in part by intervals determined from the input relations in each of the N dimensions. The deduplication operator removes duplicate rows from a potential-duplicates subset of the output of the N dimensional band join.

    Multidimensional and multi-relation sampling for implementing multidimensional two-sided interval joins

    公开(公告)号:US11194808B1

    公开(公告)日:2021-12-07

    申请号:US17239521

    申请日:2021-04-23

    Applicant: Snowflake Inc.

    Abstract: Disclosed herein are systems and methods for multidimensional and multi-relation sampling for implementing multidimensional two-sided interval joins. In an embodiment, a data platform receives query instructions for a two-sided N dimensional interval join, where N is an integer greater than 1. The two-sided N dimensional interval join has an interval-join predicate that compares intervals determined from the input relations in each of N dimensions. The data platform samples interval sizes in one or more input relations, and demarcates an N dimensional input domain based on the sampling. The data platform implements the two-sided N dimensional interval join using an N dimensional band join followed by a filter that applies the interval-join predicate. The N dimensional band join includes a hash join keyed to N dimensional domain cells overlapped at least in part by intervals in the input relations in each of the N dimensions.

    POINT-BASED RELATION SPLITTING IN GEOSPATIAL-FUNCTION-IMPLIED INTERVAL JOINS

    公开(公告)号:US20210374137A1

    公开(公告)日:2021-12-02

    申请号:US17244173

    申请日:2021-04-29

    Applicant: Snowflake Inc.

    Abstract: Disclosed herein are systems and methods for point-based relation splitting in geospatial-function-implied interval joins. In an embodiment, a data platform receives a query that applies a geospatial-function predicate to first and second geography data objects from first and second relations. The second relation is divided into point and non-point subsets based on the second data geography object. The data platform routes the point subset along a first path that includes a one-sided interval join that applies, to the first relation and the point subset, an interval-join predicate implied by the geospatial-function predicate. The data platform routes the non-point subset along a second path that does not include the one-sided interval join. The data platform obtains query results at least in part with a filter that applies the geospatial-function predicate to outputs of the one-sided interval join and the second path, and outputs the query results.

Patent Agency Ranking