Processing queries on semi-structured data columns

    公开(公告)号:US11494384B2

    公开(公告)日:2022-11-08

    申请号:US17655124

    申请日:2022-03-16

    Applicant: Snowflake Inc.

    Abstract: A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

    INDEXED GEOSPATIAL PREDICATE SEARCH

    公开(公告)号:US20220284025A1

    公开(公告)日:2022-09-08

    申请号:US17804248

    申请日:2022-05-26

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems and methods for indexed geospatial predicate search. An example method performed by at least one hardware processor includes decoding a query with a geospatial predicate. The geospatial predicate is configured between a geography data column and a constant geography object. The method further includes computing a first covering for a data value of a plurality of data values in the geography data column. The first covering includes a first set of cells in a hierarchical grid representation of a geography. The first set of cells represents a surface of the geography associated with the data value. A second covering is computed for the constant geography object. A determination is made on whether to prune at least one partition of a database organized into a set of partitions and including the geography data column based on a comparison between the first covering and the second covering.

    Indexed geospatial predicate search

    公开(公告)号:US12050605B2

    公开(公告)日:2024-07-30

    申请号:US17804248

    申请日:2022-05-26

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems and methods for indexed geospatial predicate search. An example method performed by at least one hardware processor includes decoding a query with a geospatial predicate. The geospatial predicate is configured between a geography data column and a constant geography object. The method further includes computing a first covering for a data value of a plurality of data values in the geography data column. The first covering includes a first set of cells in a hierarchical grid representation of a geography. The first set of cells represents a surface of the geography associated with the data value. A second covering is computed for the constant geography object. A determination is made on whether to prune at least one partition of a database organized into a set of partitions and including the geography data column based on a comparison between the first covering and the second covering.

    INDEX GENERATION USING LAZY REASSEMBLING OF SEMI-STRUCTURED DATA

    公开(公告)号:US20230139194A1

    公开(公告)日:2023-05-04

    申请号:US18146912

    申请日:2022-12-27

    Applicant: Snowflake Inc.

    Abstract: A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

    Lazy reassembling of semi-structured data

    公开(公告)号:US11567939B2

    公开(公告)日:2023-01-31

    申请号:US17814110

    申请日:2022-07-21

    Applicant: Snowflake Inc.

    Abstract: A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

    Index generation using lazy reassembling of semi-structured data

    公开(公告)号:US11816107B2

    公开(公告)日:2023-11-14

    申请号:US18146912

    申请日:2022-12-27

    Applicant: Snowflake Inc.

    Abstract: A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

    SCAN SET PRUNING FOR QUERIES WITH PREDICATES ON SEMI-STRUCTURED FIELDS

    公开(公告)号:US20230064151A1

    公开(公告)日:2023-03-02

    申请号:US18047595

    申请日:2022-10-18

    Applicant: Snowflake Inc.

    Abstract: A source table organized into a set of batch units is accessed. The source table comprises a column of data corresponding to a semi-structured data type. One or more indexing transformations for an object in the column are generated. The generating of the one or more indexing transformation includes converting the object to one or more stored data types. A pruning index is generated for the source table based in part on the one or more indexing transformations. The pruning index comprises a set of filters that index distinct values in each column of the source table, and each filter corresponds to a batch unit in the set of batch units. The pruning index is stored in a database with an association with the source table.

    LAZY REASSEMBLING OF SEMI-STRUCTURED DATA

    公开(公告)号:US20220358128A1

    公开(公告)日:2022-11-10

    申请号:US17814110

    申请日:2022-07-21

    Applicant: Snowflake Inc.

    Abstract: A pruning index is generated for a source table organized into a set of batch units. The source table comprises a column of semi-structured data. The pruning index comprises a set of filters that index distinct values in each column of the source table. Rather than reassembling an entire tree structure of the semi-structured data prior to indexing, the generating of the pruning index comprises traversing a reassembly hook object that represents a first portion of the semi-structured data that is subcolumnarized and traversing a residual object that represents a second portion of the semi-structured data that is not subcolumnarized. The reassembly hook object is traversed to identify values corresponding to the first portion of the semi-structured data and the residual object is traversed to identify values corresponding to the second portion. The pruning index is stored with an association with the source table.

Patent Agency Ranking