Abstract:
A method for a multi-cluster warehouse includes allocating a plurality of compute clusters as part of a virtual warehouse. The compute clusters are used to access and perform queries against one or more databases in one or more cloud storage resources. The method includes providing queries for the virtual warehouse to each of the plurality of compute clusters. Each of the plurality of compute clusters of the virtual warehouse receives a plurality of queries so that the computing load is spread across the different clusters. The method also includes dynamically adding compute clusters to and removing compute clusters from the virtual warehouse as needed based on a workload of the plurality of compute clusters.
Abstract:
Example resource management systems and methods are described. In one implementation, a resource manager is configured to manage data processing tasks associated with multiple data elements. An execution platform is coupled to the resource manager and includes multiple execution nodes configured to store data retrieved from multiple remote storage devices. Each execution node includes a cache and a processor, where the cache and processor are independent of the remote storage devices. A metadata manager is configured to access metadata associated with at least a portion of the multiple data elements.
Abstract:
Systems, methods, and devices for batch ingestion of data into a table of a database. A method includes determining a notification indicating a presence of a user file received from a client account to be ingested into a database. The method includes identifying data in the user file and identifying a target table of the database to receive the data in the user file. The method includes generating an ingest task indicating the data and the target table. The method includes assigning the ingest task to an execution node of an execution platform, wherein the execution platform comprises a plurality of execution nodes operating independent of a plurality of shared storage devices collectively storing database data. The method includes registering metadata concerning the target table in a metadata store after the data has been fully committed to the target table by the execution node.
Abstract:
A method for a database system includes storing table data for a database, the table data including information in rows and columns of one or more database tables. The method includes storing metadata on immutable storage, the metadata including information about the table data for the database. In one embodiment, mutable metadata may be periodically consolidated in the background to create new versions of metadata files and which allows for deletions of old metadata files and old data files.
Abstract:
A method for sharing data in a multi-tenant database includes generating a share object in a first account comprising a share role. The method includes associating one or more access rights with the share role, wherein the one or more access rights indicate which objects in the first account are accessible based on the share object. The method includes granting, to a second account, cross-account access rights to the share role or share object in the first account. The method includes receiving a request from the second account to access data or services of the first account. The method further includes providing a response to the second account based on the data or services of the first account.
Abstract:
A method includes storing table data for a table in a plurality of partitions and for maintaining approximate or good enough clustering. The method includes creating one or more new partitions based on changes to the table, wherein at least one of the one or more new partitions overlap with each other or previous partitions resulting in a decrease in a degree of clustering of the table. The method includes determining that a degree of clustering of the table data is below a clustering threshold. The method further includes reclustering one or more partitions of the table to improve the degree of clustering of the table in response to one or more of: determining that the degree of clustering has fallen below the clustering threshold, an explicit user command from a user, and/or as part of a DML command. Reclustering may be performed in incremental steps to iteratively improve clustering.
Abstract:
Example systems and methods for cloning catalog objects are described. In one implementation, a method identifies an original catalog object associated with a set of data and creates a duplicate copy of the original catalog object without copying the data itself. The method allows access to the set of data using the duplicate catalog object and supports modifying the data associated with the original catalog object independently of the duplicate catalog object. The duplicate catalog object can be deleted upon completion of modifying the data associated with the original catalog object.
Abstract:
Example resource provisioning systems and methods are described. In one implementation, an execution platform accesses multiple remote storage devices. The execution platform includes multiple virtual warehouses, each of which includes a cache to store data retrieved from the remote storage devices and a processor that is independent of the remote storage devices. A resource manager is coupled to the execution platform and monitors received data processing requests and resource utilization. The resource manager also determines whether additional virtual warehouses are needed based on the data processing requests and the resource utilization. If additional virtual warehouses are needed, the resource manager provisions a new virtual warehouse.