-
公开(公告)号:US20250068605A1
公开(公告)日:2025-02-27
申请号:US18499762
申请日:2023-11-01
Applicant: Snowflake Inc.
Inventor: Christian Diaconu , Chen Luo , Corbin McElhanney , Wumengjian Zhu
IPC: G06F16/21 , G06F16/2455 , G06F16/2457
Abstract: The subject technology receives a request to perform a table scan operation of a table. The subject technology determines that the table is being accessed for an initial time. The subject technology populates a columnar cache with data of the table provided by the table scan operation. The subject technology determines a set of schema versions of a set of rows from the data of the table. The subject technology determines schema information of each schema from the set of schema versions. The subject technology generates a result rowset and a second rowset comprising a union of columns that have appeared at least once in each row. The subject technology performs deserialization of rows from the result rowset and the second rowset. The subject technology provides the rows from the result rowset and the second rowset to write to a file in a particular format.
-
公开(公告)号:US12086154B1
公开(公告)日:2024-09-10
申请号:US18455229
申请日:2023-08-24
Applicant: Snowflake Inc.
Inventor: Mihir Dharamshi , Cristian Diaconu , Chen Luo , Andrew McCormick , Corbin McElhanney , Joshua Slocum , Wumengjian Zhu
IPC: G06F16/25 , G06F16/11 , G06F16/172 , G06F16/23
CPC classification number: G06F16/254 , G06F16/116 , G06F16/172 , G06F16/2379
Abstract: The subject technology receives a query, the query including a query range for processing the query. The subject technology sends a request to a key-value store for blob metadata and a set of recent writes for the query range. The subject technology receives the blob metadata, the blob metadata including information related to a set of blob files. The subject technology determines whether the set of blob files is stored in a local cache. The subject technology, in response to at least one blob file being missing from the set of blob files, sends a request to a blob store to retrieve the at least one blob file of the set of blob files. The subject technology transforms the retrieved at least one blob file to a column file format. The subject technology stores the transformed at least one blob file in the local cache.
-
公开(公告)号:US12135697B2
公开(公告)日:2024-11-05
申请号:US18326929
申请日:2023-05-31
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Adrian Hamza , Lishi Jiang , William Waddington , Khaled Yagoub , Wumengjian Zhu
Abstract: The subject technology generates, by a compute service manager, a schema hash value for a new schema version associated with a new schema version value, the schema hash value based on determining a sum of hash values of a set of attributes of value columns, the set of attributes comprises a column identifier, and a logical type of a column. The subject technology stores a mapping of the schema hash value to the new schema version value for a table in a metadata database. The subject technology stores a new schema entry based on the schema hash value, the new schema version value, and a new column for the table in the metadata database, the metadata database storing multiple entries for different schema versions, each entry including a particular schema hash value for mapping to a corresponding schema version from the different schema versions.
-
公开(公告)号:US20240020298A1
公开(公告)日:2024-01-18
申请号:US18477834
申请日:2023-09-29
Applicant: Snowflake Inc.
Inventor: Khaled Yagoub , Wumengjian Zhu , Benoit Dageville , William Waddington
CPC classification number: G06F16/2379 , G06F16/283 , G06F11/1458 , G06F16/221
Abstract: The subject technology serializes, by at least one hardware processor, non-primary key data of column-organized data into compressed serialized value data that is in a row-organized sequence, the compressed serialized value data compressed using at least one bitmap, the non-primary key data comprising a schema identifier, the column-organized data being stored in a columnar database system, the column-organized data comprising primary key data and the non-primary key data. The subject technology stores the compressed serialized value data in a key-value data store of a key-value database system, the key-value database system processing key-value data in a key-value format. The subject technology receives a query by the columnar database system. The subject technology deserializes a portion of the compressed serialized value data that corresponds to the query. The subject technology processes the query using the columnar database system.
-
公开(公告)号:US20250068640A1
公开(公告)日:2025-02-27
申请号:US18787807
申请日:2024-07-29
Applicant: Snowflake Inc.
Inventor: Mihir Dharamshi , Cristian Diaconu , Chen Luo , Andrew McCormick , Corbin McEihanney , Joshua Slocum , Wumengjian Zhu
IPC: G06F16/25 , G06F16/11 , G06F16/172 , G06F16/23
Abstract: The subject technology receives, by an execution node, blob metadata from a key-value store, the blob metadata including information related to a set of blob files. The subject technology determines, by the execution node using the blob metadata, whether a copy of each of the set of blob files is stored in a local cache of the execution node. The subject technology transforms at least one blob file, retrieved from a blob store, to a second file in a column file format, the at least one blob file being in a first format that is different than the column file format, the transforming comprising at least converting a particular snapshot file from the at least one blob file to a particular set of rowsets and writing the set of rowsets into the second file in the column file format. The subject technology stores the second file in the local cache.
-
公开(公告)号:US12189614B2
公开(公告)日:2025-01-07
申请号:US18477834
申请日:2023-09-29
Applicant: Snowflake Inc.
Inventor: Khaled Yagoub , Wumengjian Zhu , Benoit Dageville , William Waddington
Abstract: The subject technology serializes, by at least one hardware processor, non-primary key data of column-organized data into compressed serialized value data that is in a row-organized sequence, the compressed serialized value data compressed using at least one bitmap, the non-primary key data comprising a schema identifier, the column-organized data being stored in a columnar database system, the column-organized data comprising primary key data and the non-primary key data. The subject technology stores the compressed serialized value data in a key-value data store of a key-value database system, the key-value database system processing key-value data in a key-value format. The subject technology receives a query by the columnar database system. The subject technology deserializes a portion of the compressed serialized value data that corresponds to the query. The subject technology processes the query using the columnar database system.
-
公开(公告)号:US20240028567A1
公开(公告)日:2024-01-25
申请号:US18326929
申请日:2023-05-31
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Adrian Hamza , Lishi Jiang , William Waddington , Khaled Yagoub , Wumengjian Zhu
CPC classification number: G06F16/213 , G06F16/221
Abstract: The subject technology generates, by a compute service manager, a schema hash value for a new schema version associated with a new schema version value, the schema hash value based on determining a sum of hash values of a set of attributes of value columns, the set of attributes comprises a column identifier, and a logical type of a column. The subject technology stores a mapping of the schema hash value to the new schema version value for a table in a metadata database. The subject technology stores a new schema entry based on the schema hash value, the new schema version value, and a new column for the table in the metadata database, the metadata database storing multiple entries for different schema versions, each entry including a particular schema hash value for mapping to a corresponding schema version from the different schema versions.
-
公开(公告)号:US11809414B2
公开(公告)日:2023-11-07
申请号:US17538818
申请日:2021-11-30
Applicant: Snowflake Inc.
Inventor: Khaled Yagoub , Wumengjian Zhu , Benoit Dageville , William Waddington
CPC classification number: G06F16/2379 , G06F11/1458 , G06F16/221 , G06F16/283
Abstract: A distributed database system can implement a column-based database system and a row-based database system for processing data. The row-based database system can store data organized into key value pairs, and data to be processed by the row-based database system is converted to a key-value format compressing keys that correspond to values. The distributed database system can perform serialization and compression in converting the data to the key-value format for efficient data storage performance. The distributed database system can unpack portions of the converted serialized compressed data in response to queries that process a portion of serialized compressed data without unpacking the entire converted dataset.
-
公开(公告)号:US11709808B1
公开(公告)日:2023-07-25
申请号:US17656558
申请日:2022-03-25
Applicant: Snowflake Inc.
Inventor: Benoit Dageville , Adrian Hamza , William Waddington , Khaled Yagoub , Wumengjian Zhu , Lishi Jiang
CPC classification number: G06F16/213 , G06F16/221
Abstract: The subject technology receives a statement to perform an operation to add a new column into a table. The subject technology generates a schema hash value for a new schema version associated with a new schema version value. The subject technology stores a mapping of the schema hash value to the new schema version value for the table in a metadata database. The subject technology stores a new schema entry based on the schema hash value, the new schema version value, and the new column for the table in the metadata database. The subject technology performs an operation to add the new column to the table.
-
公开(公告)号:US20230169068A1
公开(公告)日:2023-06-01
申请号:US17538818
申请日:2021-11-30
Applicant: Snowflake Inc.
Inventor: Khaled Yagoub , Wumengjian Zhu , Benoit Dageville , William Waddington
CPC classification number: G06F16/2379 , G06F16/221 , G06F11/1458 , G06F16/283
Abstract: A distributed database system can implement a column-based database system and a row-based database system for processing data. The row-based database system can store data organized into key value pairs, and data to be processed by the row-based database system is converted to a key-value format compressing keys that correspond to values. The distributed database system can perform serialization and compression in converting the data to the key-value format for efficient data storage performance. The distributed database system can unpack portions of the converted serialized compressed data in response to queries that process a portion of serialized compressed data without unpacking the entire converted dataset.
-
-
-
-
-
-
-
-
-