SCHEMA EVOLUTION SUPPORT IN HYBRID TRANSACTIONAL/ANALYTICAL PROCESSING (HTAP) WORKLOADS

    公开(公告)号:US20250068605A1

    公开(公告)日:2025-02-27

    申请号:US18499762

    申请日:2023-11-01

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a request to perform a table scan operation of a table. The subject technology determines that the table is being accessed for an initial time. The subject technology populates a columnar cache with data of the table provided by the table scan operation. The subject technology determines a set of schema versions of a set of rows from the data of the table. The subject technology determines schema information of each schema from the set of schema versions. The subject technology generates a result rowset and a second rowset comprising a union of columns that have appeared at least once in each row. The subject technology performs deserialization of rows from the result rowset and the second rowset. The subject technology provides the rows from the result rowset and the second rowset to write to a file in a particular format.

    Enhanced machine learning model accuracy through post-hoc confidence score calibration

    公开(公告)号:US12236201B1

    公开(公告)日:2025-02-25

    申请号:US18677561

    申请日:2024-05-29

    Applicant: Snowflake Inc.

    Inventor: Andrzej Szwabe

    Abstract: Examples provide enhanced machine learning model accuracy through post-hoc confidence score calibration. A machine learning (ML) system receives results generated by an ML model, the results comprising at least one confidence score and electronic documents. The ML system processes the results generated by the ML model comprising performing document understanding by extracting data points from the electronic documents. The ML system associates the confidence score with the extracted data points and calibrates a confidence score associated with the extracted data points using a post-hoc calibration solution set. The ML system implements confidence scoring recalibration comprising aligning the confidence score with prediction accuracy and adjusting the generated confidence score by the recalibration. Based on adjusting the confidence score, the ML system extracts an individual element of information from the electronic documents comprising an extracted value. The ML system generates an output comprising the extracted values and an adjusted confidence score.

    CONFIGURING CHECK CONSTRAINT AND ROW VIOLATION LOGGING USING ERROR TABLES

    公开(公告)号:US20250061099A1

    公开(公告)日:2025-02-20

    申请号:US18451522

    申请日:2023-08-17

    Applicant: Snowflake Inc.

    Abstract: Provided herein are systems and methods for configuring integrity constraints (including a check constraint) and row violation logging using error tables. An example method includes decoding a query received at a network-based database system. The query includes a command to perform an operation on a base table. An integrity constraint associated with the base table is retrieved. The integrity constraint specifies a desired configuration for the base table. A verification of the integrity constraint is performed to detect erroneous data of the base table that violates the desired configuration. The erroneous data is input into an error table that is configured as a nested object of the base table. A notification that the erroneous data is available in the error table is generated and output.

    DIRECTING QUERIES TO DATABASE FILES
    314.
    发明申请

    公开(公告)号:US20250053680A1

    公开(公告)日:2025-02-13

    申请号:US18928687

    申请日:2024-10-28

    Applicant: Snowflake Inc.

    Abstract: A method of preventing queries from accessing database files based on metadata. The method includes determining a first metadata associated with a particular file and a second metadata associated with a changed version of the particular file. The method includes directing, based on the first metadata associated with the particular file, a first query for the particular file to a first file that is associated with the particular file. The method includes preventing a second query for the particular file from accessing the particular file by directing, based on the second metadata associated with the changed version of the particular file, the second query to the changed version of the particular file instead of the particular file.

    Database processing using hybrid key-value tables

    公开(公告)号:US12222964B2

    公开(公告)日:2025-02-11

    申请号:US17661162

    申请日:2022-04-28

    Applicant: Snowflake Inc.

    Abstract: A distributed database system can include transactional database and an object storage database. The data of the transactional database can be split into granules and replicated to the object storage database. The distributed database system can process transactional requests using the transactional database. The distributed database can receive a request that reads data more than a set size from the transactional database. The distributed database system can identify the granule data in the object storage database and transmit data to complete the read on one or more of a plurality of execution nodes.

    Materialized table refresh using multiple processing pipelines

    公开(公告)号:US12216654B2

    公开(公告)日:2025-02-04

    申请号:US18362898

    申请日:2023-07-31

    Applicant: Snowflake Inc.

    Abstract: A system for a materialized table (MT) refresh using multiple processing pipelines includes at least one hardware processor coupled to memory storing instructions. The instructions cause the at least one hardware processor to perform operations including determining dependencies among a plurality of intermediate MTs generated from a source MT. The source MT uses a table definition with a query on one or more base tables and a lag duration value. A graph snapshot of dependencies among the plurality of intermediate MTs is generated. Processing pipelines are configured. Each of the processing pipelines corresponds to a subset of the plurality of intermediate MTs indicated by the graph snapshot. Responsive to detecting an instruction for a refresh operation on the source MT, refreshes on corresponding intermediate MTs of the plurality of intermediate MTs in each processing pipeline of the processing pipelines are performed to complete the refresh operation on the source MT.

    MACHINE LEARNING ENHANCEMENTS TO ROOT CAUSE ANALYSIS

    公开(公告)号:US20250029001A1

    公开(公告)日:2025-01-23

    申请号:US18357082

    申请日:2023-07-21

    Applicant: Snowflake Inc.

    Abstract: Techniques described herein can monitor various data metrics. The techniques can select a subset of dimensions from a plurality of dimensions related to a data shift. The techniques including generating a plurality of decision tree graphs to classify a plurality of segments, each segment representing a combination of two or more dimensions of the subset of dimensions, and each decision tree graph including a different root node representing a respective dimension of the subset of dimensions.

    Database transactions across different domains

    公开(公告)号:US12204559B2

    公开(公告)日:2025-01-21

    申请号:US18051148

    申请日:2022-10-31

    Applicant: Snowflake Inc.

    Abstract: The subject technology sends a first statement to an execution node for executing the first statement on first storage using micro-partitions. The subject technology sends a second statement to the execution node for executing the second statement on linearizable storage. The subject technology sends a request to prepare a commit of a cross domain transaction associated with the first statement and the second statement. The subject technology generates a new version of a set of tables that were modified by the cross domain transaction and updating first metadata in a metadata database to indicate the new version. The subject technology finalizes the commit of the cross domain transaction and updates second metadata that the cross domain transaction has been committed.

Patent Agency Ranking