Scheduling data processing tasks using a stream of tracking entries

    公开(公告)号:US11507570B2

    公开(公告)日:2022-11-22

    申请号:US17650890

    申请日:2022-02-14

    Applicant: Snowflake Inc.

    Abstract: Systems, methods, and devices for scheduling of data processing tasks are disclosed. A method includes performing a modification on a first set of immutable partitions storing database data to generate a second set of immutable partitions. The second set is associated with a modified version of the database data. A change tracking entry is entered in a stream of tracking entries based on committing the modification. The change tracking entry includes an indication of the modification on the first set of immutable partitions. A stream offset of the stream of tracking entries is advanced based on the entering of the change tracking entry in the stream of tracking entries. The stream offset indicates a timestamp associated with a latest committed modification to the database data. A data processing task is scheduled for execution on the modified version of the database data based on the advancing of the stream offset.

    REAL-TIME STREAMING DATA INGESTION INTO DATABASE TABLES

    公开(公告)号:US20220327132A1

    公开(公告)日:2022-10-13

    申请号:US17647500

    申请日:2022-01-10

    Applicant: Snowflake Inc.

    Abstract: A streaming ingest platform can improve latency and expense issues related to uploading data into a cloud data system. The streaming ingest platform can organize the data to be ingested into per-table chunks and per-account blobs. This data may be committed and may be made available for query processing before it is ingested into the target source tables. This significantly improves latency issues. The streaming ingest platform can also accommodate uploading data from various sources with different processing and communication capabilities, such as Internet of Things (IOT) devices.

    Processing streams on external data sources

    公开(公告)号:US11461274B2

    公开(公告)日:2022-10-04

    申请号:US17517398

    申请日:2021-11-02

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives an operation to perform on an external data source accessible via a network, the external data source being hosted by an external system separate from a network-based database system. The subject technology determines a set of shards corresponding to the external data source. The subject technology determines a set of offsets of each shard of the set of shards. The subject technology, based on the set of shards and the set of offsets, performs the operation on the external data source. The subject technology provides an indication that the operation is complete.

    PARTITION-BASED SCANNING OF EXTERNAL TABLES FOR QUERY PROCESSING

    公开(公告)号:US20220269674A1

    公开(公告)日:2022-08-25

    申请号:US17650462

    申请日:2022-02-09

    Applicant: Snowflake Inc.

    Abstract: Disclosed herein are embodiments of systems and methods for partition-based scanning of external tables for query processing. In an example embodiment, a database platform receives a query that includes one or more predicates, where the query is directed at least to data in an external table that is stored in an external storage platform that is external to the database platform. The database platform identifies, based on metadata that summarizes the data in the external table, one or more partitions of the external table that potentially include data that satisfies the one or more predicates. The database platform also identifies, from the one or more identified partitions, data that satisfies the one or more predicates. The database platform sends a response to the query to the client, the response comprising the data satisfying the one or more predicates.

    TASK SCHEDULING USING A STREAM OF COMMITTED TRANSACTIONS

    公开(公告)号:US20220188297A1

    公开(公告)日:2022-06-16

    申请号:US17653491

    申请日:2022-03-04

    Applicant: Snowflake Inc.

    Abstract: A method includes generating a task using a plurality of logical statements embedded in a database, the plurality of logical statements corresponding to a data modification. Database data is ingested into a staging table that is configured within the database. The task is executed based on applying the data modification to a first set of partitions storing the database data and generating a second set of partitions. The second set of partitions store modified data corresponding to the database data. A stream of committed transactions is advanced at least in part by adding an entry into the stream. The entry corresponds to committed transactions performed on the first set of partitions during the data modification. A data processing task is scheduled for execution on the modified data based on the advancing of the stream offset.

    Notifying modifications to external tables in database systems

    公开(公告)号:US11347728B2

    公开(公告)日:2022-05-31

    申请号:US17462435

    申请日:2021-08-31

    Applicant: Snowflake Inc.

    Abstract: The subject technology receives a notification that a modification has been made to an external table, the modification comprising inserting at least one row of new data to the external table, the at least one row corresponding to a first micro-partition that includes a first portion of data from the external table prior to the inserting. The subject technology, in response to the notification indicating the modification to the external table, generates a new micro-partition different from the first micro-partition, the new micro-partition including the inserted at least one row of new data and the first portion of data from the external table. The subject technology generates a refreshed materialized view based at least in part on the generated new micro-partition such that the refreshed materialized view comprises a representation of the external table after the modification has been made.

    Secure table-valued functions in a cloud database

    公开(公告)号:US11347527B1

    公开(公告)日:2022-05-31

    申请号:US17390344

    申请日:2021-07-30

    Applicant: Snowflake Inc.

    Abstract: A system comprises at least one hardware processor and a memory storing instructions. When executed, the instructions cause the at least one hardware processor to perform operations comprising receiving, in a computing process, a Java user-defined table function (Java UDTF), the Java UDTF including code related to a process method to be performed that includes receiving one or more input tables and transforming the one or more input tables to an output table; determining, using at least a security policy, whether performing one or more portions of the process method are permitted; and performing portions of the process method determined to be permitted.

    Tracking intermediate changes in database data

    公开(公告)号:US11321309B2

    公开(公告)日:2022-05-03

    申请号:US17384269

    申请日:2021-07-23

    Applicant: Snowflake Inc.

    Abstract: Systems, methods, and devices for tracking a series of changes to database data are disclosed. A method includes executing a transaction to modify data in a micro-partition of a table of a database by generating a new micro-partition that embodies the transaction. The method includes associating transaction data with the new micro-partition, wherein the transaction data comprises a timestamp when the transaction was fully executed, and further includes associating modification data with the new micro-partition that comprises an indication of one or more rows of the table that were modified by the transaction. The method includes joining the transaction data with the modification data to generate joined data and querying the joined data to determine a listing of intermediate modifications made to the table between a first timestamp and a second timestamp.

    EXTERNAL FUNCTION INVOCATION BY A DATA SYSTEM

    公开(公告)号:US20220129335A1

    公开(公告)日:2022-04-28

    申请号:US17572205

    申请日:2022-01-10

    Applicant: Snowflake Inc.

    Abstract: A query referencing a function associated with a remote software component is received by a network-based data warehouse system. Temporary security credentials corresponding to a role at a cloud computing service platform are obtained. The role has permission to send calls to a web endpoint corresponding to the remote software component. A request comprising input data and electronically signed using the temporary security credentials is sent to a web Application Programming Interface (API) management system of the cloud computing service platform. The request, when received by the web API management system, causes the web API management system to invoke external functionality provided by the remote software component at the web endpoint with respect to the input data. A response comprising a result of invoking the external functionality is received from the web API management system, and the result data is processed according to the query.

    Real-time streaming data ingestion into database tables

    公开(公告)号:US11250006B1

    公开(公告)日:2022-02-15

    申请号:US17386258

    申请日:2021-07-27

    Applicant: Snowflake Inc.

    Abstract: A streaming ingest platform can improve latency and expense issues related to uploading data into a cloud data system. The streaming ingest platform can organize the data to be ingested into per-table chunks and per-account blobs. This data may be committed and may be made available for query processing before it is ingested into the target source tables. This significantly improves latency issues. The streaming ingest platform can also accommodate uploading data from various sources with different processing and communication capabilities, such as Internet of Things (IOT) devices.

Patent Agency Ranking