-
公开(公告)号:US11494379B2
公开(公告)日:2022-11-08
申请号:US17239529
申请日:2021-04-23
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/00 , G06F16/2453 , G06F16/2455 , G06F16/2458
Abstract: Disclosed herein are systems and methods for pre-filter deduplication for multidimensional two-sided interval joins. In an embodiment, a data platform receives query instructions for a two-sided N dimensional interval join, where N is an integer greater than 1. The two-sided N dimensional interval join has an interval-join predicate that compares intervals determined from the input relations in each of N dimensions. The data platform implements the two-sided N dimensional interval join as a query-plan section that includes an N dimensional band join that is followed by a deduplication operator that is followed by a filter that applies the interval-join predicate. The N dimensional band join includes a hash join keyed to N dimensional domain cells overlapped at least in part by intervals determined from the input relations in each of the N dimensions. The deduplication operator removes duplicate rows from a potential-duplicates subset of the output of the N dimensional band join.
-
2.
公开(公告)号:US11216464B1
公开(公告)日:2022-01-04
申请号:US17239515
申请日:2021-04-23
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/00 , G06F16/2455 , G06F16/28 , G06F16/22 , G06F16/248 , G06F16/2453
Abstract: Disclosed herein are systems and methods for implementing multidimensional two-sided interval joins on a distributed hash-based-equality-join infrastructure. In an embodiment, a data platform receives, for a query on a database, query instructions that include a two-sided N-dimensional interval join of a first input relation and a second input relation, where N is an integer greater than 1. The two-sided N-dimensional interval join has an interval-join predicate that, in each of N dimensions, compares an interval determined from the first input relation with an interval determined from the second input relation. The data platform generates a query-execution plan that implements the two-sided N-dimensional interval join as a query-plan section that includes an N-dimensional band join followed by a filter that applies the interval-join predicate to a band-join output of the N-dimensional band join. The data platform obtains results of the query at least in part by executing the query-execution plan.
-
公开(公告)号:US20210374140A1
公开(公告)日:2021-12-02
申请号:US17389804
申请日:2021-07-30
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Simon Holm Jensen , Spyridon Triantafyllis
IPC: G06F16/2453
Abstract: In an embodiment, a database platform receives query instructions for a query on a database, where the query instructions include instructions for a geospatial-function join that includes a geospatial-function predicate. The database platform generates a query-execution plan based on the query instructions, including replacing the geospatial-function join with one or more interval joins that each include one or more predicates implied by the geospatial-function predicate. The database platform inserts, into the query-execution plan above the one or more interval joins, a filter operator that applies the geospatial-function predicate. The database platform obtains results of the query at least in part by executing the query-execution plan, and returns the query results in response to the query.
-
公开(公告)号:US20230085410A1
公开(公告)日:2023-03-16
申请号:US18050130
申请日:2022-10-27
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/2453 , G06F16/2458 , G06F16/2455 , G06F16/22 , G06F16/28 , G06F16/248
Abstract: In an embodiment, a data platform receives a query that includes a two-sided N dimensional interval join of first and second input relations, where N>1. The two-sided N dimensional interval join has an interval-join predicate that, in each of N dimensions, compares intervals determined from the first and second input relations. The data platform implements the interval join at least in part by identifying an intermediate relation that includes all combinations of a row from the first input relation and a row from the second input relation where, in each of the N dimensions, the intervals determined from the first and second input relations both overlap a common N dimensional domain region of an input domain of the first and second input relations. The data platform obtains and returns results of the query.
-
公开(公告)号:US11494385B2
公开(公告)日:2022-11-08
申请号:US17454894
申请日:2021-11-15
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/00 , G06F16/2455 , G06F16/22 , G06F16/248 , G06F16/28 , G06F16/2453
Abstract: In an embodiment, a data platform implements a two-sided N dimensional interval join using an N dimensional band join followed by a filter that applies a predicate of the interval join. The data platform generates first and second modified relations from first and second input relations. Each modified relation includes a copy of each row from the corresponding input relation for each input-domain cell that overlaps, in each of N dimensions, a bounding polygon of intervals determined from the row of the corresponding input relation. The data platform inserts, in each row in each modified relation, an input-domain-cell identifier of the corresponding overlapping input-domain cell and uses a hash-equality join that receives the first and second modified relations and that is keyed on the input-domain-cell identifiers. The data platform obtains results of a query by executing a query-execution plan that includes the query-plan section.
-
6.
公开(公告)号:US20220300512A1
公开(公告)日:2022-09-22
申请号:US17454899
申请日:2021-11-15
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/2453 , G06F16/2458 , G06F16/2455
Abstract: In an embodiment, a data platform receives a query that includes a two-sided N dimensional interval join of first and second input relations. The data platform samples, with respect to each of one or more of the N dimensions, one or both of the first input relation and the second input relation with respect to an interval size of an interval determined from the input relation. The data platform demarcates the N dimensional input domain into non-overlapping N dimensional input-domain cells based on the sampling. The data platform implements the interval join using a query-execution plan that includes an equality join that is keyed on input-domain-cell identifiers of input-domain cells that at least partially overlap bounding polygons of the intervals determined from the first and second input relations. The equality join is followed in the query-execution plan by a filter that applies the interval join predicate. The data platform obtains results of the query by executing the query-execution plan.
-
公开(公告)号:US10909121B2
公开(公告)日:2021-02-02
申请号:US16863831
申请日:2020-04-30
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Yi Fang , Martin Hentschel , Ashish Motivala , Spyridon Triantafyllis , Yizhi Zhu
IPC: G06F16/2455 , G06F16/23 , G06F16/2457 , G06F16/22 , G06F16/2458 , G06F16/27
Abstract: The subject technology receives first metadata corresponding to a set of micro-partitions. The subject technology generates second metadata for a grouping of the first metadata. The subject technology generates a first data structure including the first metadata and a second data structure including the second metadata, the second data structure including information associating the second metadata to the first metadata. The subject technology stores the first data structure and the second data structure in persistent storage as a first file and a second file. The subject technology receives a query on a table. Further, the subject technology analyzes the query against cumulative table metadata to determine whether data stored in the table matches the query.
-
公开(公告)号:US11636114B2
公开(公告)日:2023-04-25
申请号:US17123551
申请日:2020-12-16
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Yi Fang , Martin Hentschel , Ashish Motivala , Spyridon Triantafyllis , Yizhi Zhu
IPC: G06F16/2455 , G06F16/23 , G06F16/2457 , G06F16/22 , G06F16/2458 , G06F16/27
Abstract: The subject technology receives first metadata corresponding to a set of micro-partitions. The subject technology stores a first data structure and a second data structure in storage as a first file and a second file, first data structure including the first metadata and a second data structure including second metadata, the first metadata corresponding to a set of micro-partitions, the second metadata for a grouping of the first metadata, the second data structure including information associating the second metadata to the first metadata. The subject technology stores third metadata for a table, the third metadata comprising information about data stored in a micro-partition of the table.
-
9.
公开(公告)号:US11537614B2
公开(公告)日:2022-12-27
申请号:US17454899
申请日:2021-11-15
Applicant: Snowflake Inc.
Inventor: Matthias Carl Adams , Spyridon Triantafyllis , Lars Volker , Kevin Wang
IPC: G06F16/00 , G06F16/2453 , G06F16/2455 , G06F16/2458
Abstract: In an embodiment, a data platform receives a query that includes a two-sided N dimensional interval join of first and second input relations. The data platform samples, with respect to each of one or more of the N dimensions, one or both of the first input relation and the second input relation with respect to an interval size of an interval determined from the input relation. The data platform demarcates the N dimensional input domain into non-overlapping N dimensional input-domain cells based on the sampling. The data platform implements the interval join using a query-execution plan that includes an equality join that is keyed on input-domain-cell identifiers of input-domain cells that at least partially overlap bounding polygons of the intervals determined from the first and second input relations. The equality join is followed in the query-execution plan by a filter that applies the interval-join predicate. The data platform obtains results of the query by executing the query-execution plan.
-
公开(公告)号:US11106678B2
公开(公告)日:2021-08-31
申请号:US17086279
申请日:2020-10-30
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Yi Fang , Martin Hentschel , Ashish Motivala , Spyridon Triantafyllis , Yizhi Zhu
IPC: G06F16/2455 , G06F16/23 , G06F16/2457 , G06F16/22 , G06F16/2458 , G06F16/27
Abstract: The subject technology receives first metadata corresponding to a set of micro-partitions. The subject technology stores a first data structure and a second data structure in storage as a first file and a second file, first data structure including the first metadata and a second data structure including second metadata, the first metadata corresponding to a set of micro-partitions, the second metadata for a grouping of the first metadata, the second data structure including information associating the second metadata to the first metadata. The subject technology stores third metadata for a table, the third metadata comprising: cumulative table metadata comprising global information about a plurality of micro-partitions of the table, the cumulative table metadata being stored in a metadata micro-partition associated with the table.
-
-
-
-
-
-
-
-
-