-
公开(公告)号:WO2015126957A1
公开(公告)日:2015-08-27
申请号:PCT/US2015/016403
申请日:2015-02-18
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: DAGEVILLE, Benoit , CRUANES, Thierry , ZUKOWSKI, Marcin
IPC: G06F15/173
CPC classification number: G06F17/30106 , G06F9/4881 , G06F9/5016 , G06F9/5044 , G06F9/5083 , G06F9/5088 , G06F17/302 , G06F17/30292 , G06F17/30315 , G06F17/30371 , G06F17/30445 , G06F17/30463 , G06F17/30466 , G06F17/30469 , G06F17/30477 , G06F17/3048 , G06F17/30498 , G06F17/30545 , G06F17/30575 , G06F17/30598 , G06F17/30864 , G06F17/30867 , G06F17/30914 , H04L67/1095 , H04L67/1097 , H04L67/2842
Abstract: Example resource management systems and methods are described. In one implementation, a resource manager is configured to manage data processing tasks associated with multiple data elements. An execution platform is coupled to the resource manager and includes multiple execution nodes configured to store data retrieved from multiple remote storage devices. Each execution node includes a cache and a processor, where the cache and processor are independent of the remote storage devices. A metadata manager is configured to access metadata associated with at least a portion of the multiple data elements.
Abstract translation: 描述示例资源管理系统和方法。 在一个实现中,资源管理器被配置为管理与多个数据元素相关联的数据处理任务。 执行平台耦合到资源管理器,并且包括配置成存储从多个远程存储设备检索的数据的多个执行节点。 每个执行节点包括高速缓存和处理器,其中高速缓存和处理器独立于远程存储设备。 元数据管理器被配置为访问与多个数据元素的至少一部分相关联的元数据。
-
公开(公告)号:WO2018045372A1
公开(公告)日:2018-03-08
申请号:PCT/US2017/050075
申请日:2017-09-05
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: CRUANES, Thierry , ZUKOWSKI, Marcin , DAGEVILLE, Benoit , YAN, Jiaqi
Abstract: A method includes storing table data for a table in a plurality of partitions and for maintaining approximate or good enough clustering. The method includes creating one or more new partitions based on changes to the table, wherein at least one of the one or more new partitions overlap with each other or previous partitions resulting in a decrease in a degree of clustering of the table. The method includes determining that a degree of clustering of the table data is below a clustering threshold. The method further includes reclustering one or more partitions of the table to improve the degree of clustering of the table in response to one or more of: determining that the degree of clustering has fallen below the clustering threshold, an explicit user command from a user, and/or as part of a DML command. Reclustering may be performed in incremental steps to iteratively improve clustering.
Abstract translation: 一种方法包括将表格的表格数据存储在多个分区中并用于保持近似或足够好的聚类。 该方法包括基于对表格的改变创建一个或多个新分区,其中一个或多个新分区中的至少一个与彼此重叠或先前分区重叠,导致表格聚类程度的降低。 该方法包括确定表格数据的聚类程度低于聚类阈值。 该方法进一步包括:响应于以下各项中的一项或多项来重新聚集表格的一个或多个分区以提高表格的聚类度:确定聚类度已经降至聚类阈值以下,来自用户的显式用户命令, 和/或作为DML命令的一部分。 可以通过增量步骤执行重新集群,以反复改进集群。 p>
-
公开(公告)号:WO2015127076A1
公开(公告)日:2015-08-27
申请号:PCT/US2015/016614
申请日:2015-02-19
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: DAGEVILLE, Benoit , CRUANES, Thierry , ZUKOWSKI, Marcin
IPC: G06F9/46
CPC classification number: G06F17/30106 , G06F9/4881 , G06F9/5016 , G06F9/5044 , G06F9/5083 , G06F9/5088 , G06F17/302 , G06F17/30292 , G06F17/30315 , G06F17/30371 , G06F17/30445 , G06F17/30463 , G06F17/30466 , G06F17/30469 , G06F17/30477 , G06F17/3048 , G06F17/30498 , G06F17/30545 , G06F17/30575 , G06F17/30598 , G06F17/30864 , G06F17/30867 , G06F17/30914 , H04L67/1095 , H04L67/1097 , H04L67/2842
Abstract: Example resource provisioning systems and methods are described. In one implementation, an execution platform accesses multiple remote storage devices. The execution platform includes multiple virtual warehouses, each of which includes a cache to store data retrieved from the remote storage devices and a processor that is independent of the remote storage devices. A resource manager is coupled to the execution platform and monitors received data processing requests and resource utilization. The resource manager also determines whether additional virtual warehouses are needed based on the data processing requests and the resource utilization. If additional virtual warehouses are needed, the resource manager provisions a new virtual warehouse.
Abstract translation: 描述了示例资源供应系统和方法。 在一个实现中,执行平台访问多个远程存储设备。 执行平台包括多个虚拟仓库,每个虚拟仓库包括用于存储从远程存储设备检索的数据的高速缓存以及独立于远程存储设备的处理器。 资源管理器耦合到执行平台并监视接收到的数据处理请求和资源利用。 资源管理器还根据数据处理请求和资源利用率确定是否需要附加的虚拟仓库。 如果需要额外的虚拟仓库,资源经理会提供一个新的虚拟仓库。 p>
-
公开(公告)号:WO2015126968A2
公开(公告)日:2015-08-27
申请号:PCT/US2015/016418
申请日:2015-02-18
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: DAGEVILLE, Benoit , CRUANES, Thierry , ZUKOWSKI, Marcin
CPC classification number: G06F17/30106 , G06F9/4881 , G06F9/5016 , G06F9/5044 , G06F9/5083 , G06F9/5088 , G06F17/302 , G06F17/30292 , G06F17/30315 , G06F17/30371 , G06F17/30445 , G06F17/30463 , G06F17/30466 , G06F17/30469 , G06F17/30477 , G06F17/3048 , G06F17/30498 , G06F17/30545 , G06F17/30575 , G06F17/30598 , G06F17/30864 , G06F17/30867 , G06F17/30914 , H04L67/1095 , H04L67/1097 , H04L67/2842
Abstract: Example data management systems and methods are described. In one implementation, a method identifies multiple files to process based on a received query and identifies multiple execution nodes available to process the multiple files. The method initially creates multiple scansets, each including a portion of the multiple files, and assigns each scanset to one of the execution nodes based on a file assignment model. The multiple scansets are processed by the multiple execution nodes. If the method determines that a particular execution node has finished processing all files in its assigned scanset, an unprocessed file is reassigned from another execution node to the particular execution node.
Abstract translation: 描述示例数据管理系统和方法。 在一个实现中,一种方法基于接收到的查询来识别要处理的多个文件,并且识别可用于处理多个文件的多个执行节点。 该方法最初创建多个scanets,每个都包含多个文件的一部分,并根据文件分配模型将每个scanet分配给其中一个执行节点。 多个scanets由多个执行节点处理。 如果该方法确定特定执行节点已完成处理其分配的扫描集中的所有文件,则未处理的文件将从另一执行节点重新分配给特定的执行节点。
-
公开(公告)号:WO2015126973A3
公开(公告)日:2015-08-27
申请号:PCT/US2015/016425
申请日:2015-02-18
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: DAGEVILLE, Benoit , CRUANES, Thierry , ZUKOWSKI, Marcin
IPC: G06F9/50
Abstract: Example resource provisioning systems and methods are described. In one implementation, an execution platform accesses multiple remote storage devices. The execution platform includes multiple virtual warehouses, each of which includes a cache to store data retrieved from the remote storage devices and a processor that is independent of the remote storage devices. A resource manager is coupled to the execution platform and monitors received data processing requests and resource utilization. The resource manager also determines whether additional virtual warehouses are needed based on the data processing requests and the resource utilization. If additional virtual warehouses are needed, the resource manager provisions a new virtual warehouse.
-
公开(公告)号:WO2015126962A1
公开(公告)日:2015-08-27
申请号:PCT/US2015/016410
申请日:2015-02-18
Applicant: SNOWFLAKE COMPUTING INC.
Inventor: DAGEVILLE, Benoit , CRUANES, Thierry , ZUKOWSKI, Marcin
IPC: G06F12/00
CPC classification number: G06F17/30106 , G06F9/4881 , G06F9/5016 , G06F9/5044 , G06F9/5083 , G06F9/5088 , G06F17/302 , G06F17/30292 , G06F17/30315 , G06F17/30371 , G06F17/30445 , G06F17/30463 , G06F17/30466 , G06F17/30469 , G06F17/30477 , G06F17/3048 , G06F17/30498 , G06F17/30545 , G06F17/30575 , G06F17/30598 , G06F17/30864 , G06F17/30867 , G06F17/30914 , H04L67/1095 , H04L67/1097 , H04L67/2842
Abstract: Example caching systems and methods are described. In one implementation, a method identifies multiple files used to process a query and distributes each of the multiple files to a particular execution node to execute the query. Each execution node determines whether the distributed file is stored in the execution node's cache. If the execution node determines that the file is stored in the cache, it processes the query using the cached file. If the file is not stored in the cache, the execution node retrieves the file from a remote storage device, stores the file in the execution node's cache, and processes the query using the file.
Abstract translation: 描述示例缓存系统和方法。 在一个实现中,一种方法识别用于处理查询的多个文件,并将多个文件中的每个文件分配给特定执行节点以执行查询。 每个执行节点确定分布式文件是否存储在执行节点的缓存中。 如果执行节点确定该文件存储在缓存中,则使用缓存文件处理该查询。 如果文件未存储在缓存中,则执行节点从远程存储设备检索文件,将文件存储在执行节点的缓存中,并使用该文件处理查询。
-
-
-
-
-