-
公开(公告)号:US20240134851A1
公开(公告)日:2024-04-25
申请号:US18047872
申请日:2022-10-18
Applicant: Snowflake Inc.
Inventor: Xinzhu Cai , Florian Andreas Funke
IPC: G06F16/2453 , G06F16/22
CPC classification number: G06F16/24537 , G06F16/2255
Abstract: Provided herein are systems and methods for handling build-side skew. For example, a method includes computing a plurality of hash values for a join operation. The join operation uses a corresponding plurality of row sets. The plurality of hash values are sampled to detect a frequent hash value. A build-side row set is partitioned using the frequent hash value to generate a partitioned build-side row set. The build-side row set is selected from the plurality of row sets. The partitioned build-side row set is distributed to a plurality of hash-join-build (HJB) instances executing at a corresponding plurality of servers.
-
公开(公告)号:US20240134844A1
公开(公告)日:2024-04-25
申请号:US18321994
申请日:2023-05-22
Applicant: Snowflake Inc.
Inventor: Abdullah Al Mahmood , Ruta Dhaneshwar , Max Heimel , Xin Huang , Canzhou Qu , Purav B. Saraiya , Konstantinos Zoumpatianos
CPC classification number: G06F16/2365 , G06F16/258
Abstract: A data platform including an error handling framework for loading of input data. The data platform generates input data columns based on an input file and generates result data columns based on the input data columns and evaluating expressions. The data platform detects projection errors during the generating of the result data columns and stores result error indicators in error indicator arrays of the result data columns based on the projection errors. The data platform generates filtered result data columns based on the result data columns and the result error indicator arrays of the result data columns and stores the filtered result data columns in a database of the data platform.
-
公开(公告)号:US11968258B2
公开(公告)日:2024-04-23
申请号:US18384467
申请日:2023-10-27
Applicant: Snowflake Inc.
Inventor: Edmond T. Chan , Pui Kei Johnston Chu , Chao Ren , Stephanie Stillman , Dangfu Wang
IPC: H04L67/1095 , G06F16/21 , H04L67/12
CPC classification number: H04L67/1095 , G06F16/211 , H04L67/12
Abstract: Provided herein are systems and methods to provide a way to share metrics regarding shared data access and accesses associated with data providers for different data listings of the data exchange. For example, the method may comprise detecting one or more client interactions with a set of data listings of a data exchange, the set of data listings associated with one or data providers. The method may further comprise collecting metrics corresponding to the one or more client interactions. In addition, the method may share metrics relevant to the one or more data providers with the one or more data providers.
-
公开(公告)号:US20240111885A1
公开(公告)日:2024-04-04
申请号:US18306704
申请日:2023-04-25
Applicant: Snowflake Inc.
Inventor: Durga Mahesh Arikatla , Subramanian Muralidhar , Vishnu Dutt Paladugu , Shakhina Pulatova , Di Wu , Ziqi Xu
CPC classification number: G06F21/6218 , G06F21/604 , G06F2221/2141
Abstract: A data dictionary generation system utilizes a background service that is programmed to automatically populate and update a data dictionary for listings offering shared data. A data dictionary includes metadata describing the shared data overall as well as the individual objects included in the listing, such as the individual tables, schemas, views, and functions. To generate the data dictionary, the data dictionary generation system analyzes the shared data to identify objects, identifies a set of data fields associated with each identified object and populates the set of data fields associated with each identified object based on the shared data offered by the listing. To ensure that a data dictionary for each listing remains up to date, the data dictionary generation system periodically scans the listings to identify any changes to share access granted to the listings.
-
公开(公告)号:US20240095393A1
公开(公告)日:2024-03-21
申请号:US18521589
申请日:2023-11-28
Applicant: Snowflake Inc.
Inventor: Artin Avanes , Khalid Zaman Bijon , Zheng Mi , Subramanian Muralidhar , David Schultz , Jian Xu
CPC classification number: G06F21/6227 , G06F16/2282 , G06F21/604 , G06F21/62 , G06F21/6218 , G06F2221/2141
Abstract: Row-level security (RLS) may provide fine-grained access control based on flexible, user-defined access policies to databases, tables, objects, and other data structures. A RLS policy may be an entity or object that defines rules for row access. A RLS policy may be decoupled or independent from any specific table. This allows more robust and flexible control. A RLS policy may then be attached to one or more tables. The RLS policy may include a Boolean-valued expression.
-
公开(公告)号:US11934927B2
公开(公告)日:2024-03-19
申请号:US18087518
申请日:2022-12-22
Applicant: SNOWFLAKE INC.
Inventor: Orestis Kostakis , Qiming Jiang , Boxin Jiang
Abstract: Systems and methods for managing input and output error of a machine learning (ML) model in a database system are presented herein. A set of test queries is executed on a first version of a database system to generate first test data, wherein the first version of the system comprises a ML model to generate an output corresponding to a function of the database system. An error model is trained based on the first test data and second test data generated based on a previous version of the system. The error model determines an error associated with the ML model between the first and previous versions of the system. The first version of the system is deployed with the error model, which corrects an output or an input of the ML model until sufficient data has been produced by the error model to retrain the ML model.
-
公开(公告)号:US11934553B2
公开(公告)日:2024-03-19
申请号:US17390935
申请日:2021-07-31
Applicant: Snowflake Inc.
Inventor: Justin Langseth , Michael Earle Rainey
IPC: G06F21/62 , G06F16/245 , G06F16/25 , G06F16/27 , G06F21/60
CPC classification number: G06F21/6227 , G06F16/245 , G06F16/258 , G06F16/27 , G06F21/602
Abstract: Embodiments of the present disclosure may provide a data clean room allowing encryption based data analysis across multiple accounts of different database users. The data clean room may also restrict which data may be used in the analysis and may restrict the output. A requesting user's data can be encrypted using a key and a provider user can generate a shareable database function that accepts the key to decrypt the data to generate the results data without exposing each others' data.
-
公开(公告)号:US11934543B1
公开(公告)日:2024-03-19
申请号:US18056489
申请日:2022-11-17
Applicant: Snowflake Inc.
Inventor: Jennifer Wenjun Bi , Khalid Zaman Bijon , Damien Carru , Thierry Cruanes , Simon Holm Jensen , Daniel N. Meredith , Subramanian Muralidhar , Eric Robinson , David Schultz , Zixi Zhang
CPC classification number: G06F21/604 , G06F21/6227 , G06F2221/2113 , G06F2221/2141
Abstract: Systems and methods for generating transient object references are provided. The systems and methods perform operations including establishing a session between a first entity and a second entity. The operations include identifying an object that the first entity is authorized to access according to a first set of access privileges. The operations include generating a reference associated with the object. The operations include temporarily authorizing the second entity to access the object using the reference according to a second set of access privileges, the second set of access privileges being derived from the first set of access privileges.
-
公开(公告)号:US20240086381A1
公开(公告)日:2024-03-14
申请号:US18513163
申请日:2023-11-17
Applicant: Snowflake Inc.
IPC: G06F16/215 , G06F16/2455 , G06F16/2457 , G06F16/248
CPC classification number: G06F16/215 , G06F16/24552 , G06F16/24573 , G06F16/248
Abstract: Disclosed are techniques for deduplicating files to be ingested by a database. A bloom filter may be built for each of a first set of files to be ingested into a data exchange to generate a set of bloom filters, wherein each of the set of bloom filters is built with a number of hash functions that is based on a desired false positive rate. The set of bloom filters may be stored in the metadata storage of the data exchange. In response to receiving a set of candidate files to be ingested, identifying using the set of bloom filters, candidate files from the set of candidate files that are duplicative of a file in the first set of files and pruning from the set of candidate files, each candidate file identified as being duplicative of a file in the first set of files using the set of bloom filters.
-
公开(公告)号:US11921733B2
公开(公告)日:2024-03-05
申请号:US17813662
申请日:2022-07-20
Applicant: Snowflake Inc.
Inventor: Harsha S. Kapre , Mark T. Keller , Srinath Shankar , Kushan A. Zaveri
IPC: G06F16/2458 , G06F16/2453 , G06F16/2455 , G06F16/25
CPC classification number: G06F16/2471 , G06F16/24532 , G06F16/24561 , G06F16/256 , G06F16/258
Abstract: Techniques for fetching query result data using result batches includes generating a plurality of result batches based on query result information. The query result information is associated with query result data generated from execution of a query. Each result batch of the plurality of result batches includes a result data retrieval function for a corresponding data portion of a plurality of data portions of the query result data. The plurality of result batches are encoded for distribution to a corresponding plurality of computing nodes. The techniques further include causing retrieving of the plurality of data portions of the query result data by the corresponding plurality of computing nodes based on the result data retrieval function for each of the plurality of data portions.
-
-
-
-
-
-
-
-
-