摘要:
A shared nothing distributed database system includes a master node and a set of shared nothing nodes. Each shared nothing node includes a node state table stored in memory that characterizes various statements and multiple processes implementing each statement, target resource consumption rate for each process and process activity information. A monitor module executed by a central processing unit processes information in the node state table and adjusts values in the node state table in accordance with priority criteria specifying individual user priority ascribed to a statement. A query associated with a statement is processed in the shared nothing distributed database system in accordance with the priority criteria.
摘要:
A method, article of manufacture, and apparatus for processing queries, comprising analyzing a query tree, determining at least one operator based on the query tree analysis, assigning a memory allocation for each of the at least one operator, and storing the assignment in a storage device. In some embodiments, a memory classification for each of the at least one operator is determined. In some embodiments, assigning a memory allocation for each of the at least one operator includes assigning a memory allocation based on the memory classification.
摘要:
A method of analyzing the performance of a query optimizer includes identifying an event trigger. A reproduction object characterizing operational parameters of the customer computer at the time of the event trigger is populated. The reproduction object is transported from the customer computer to a test computer. The reproduction object is analyzed at the test computer to characterize the performance of a query optimizer.
摘要:
A method, article of manufacture, and apparatus for processing queries, comprising analyzing a query tree, determining at least one operator based on the query tree analysis, assigning a memory allocation for each of the at least one operator, and storing the assignment in a storage device. In some embodiments, a memory classification for each of the at least one operator is determined. In some embodiments, assigning a memory allocation for each of the at least one operator includes assigning a memory allocation based on the memory classification.
摘要:
The invention enables a correlated or multi-row subquery (CSQ) to be performed on distributed MPP and shared-nothing databases by broadcasting intermediate results, prior to a correlation operation, from subquery execution on one segment to all other segments in the distributed database so that the respective CSQs of each segment will have access to the necessary results to permit correct execution of the CSQ. Additionally, the intermediate results are saved to disk to avoid the necessity of replicating the same intermediate results multiple times during execution of a subquery plan.
摘要:
A method of analyzing the performance of a query optimizer includes identifying an event trigger. A reproduction object characterizing operational parameters of the customer computer at the time of the event trigger is populated. The reproduction object is transported from the customer computer to a test computer. The reproduction object is analyzed at the test computer to characterize the performance of a query optimizer.
摘要:
Systems and methods for automatically scaling a big data system. Methods include determining, at a first time, a first number of nodes for a cluster to process a request; assigning an amount of nodes equal to the first number of nodes to the cluster; determining a rate of progress of the request; determining, at a second time based on the rate of progress a second number of nodes; and modifying the amount of nodes to equal the second number of nodes. Systems include a cluster manager, to add and/or remove any nodes; the big data system, to process requests that utilize the cluster and nodes, and an automatic scaling cluster manager including a big data interface for communicating with the big data system; a cluster manager interface for communicating with the cluster manager; and a cluster state machine.
摘要:
A method, article of manufacture, and apparatus for processing queries, comprising receiving a query, determining a query classification for the query, assigning the query to a resource queue based on the determined query classification, and placing the query in the assigned resource queue. In some embodiments, the resource queue is divided into a plurality of slots, and the query is placed in a slot. The resource queue may be associated with a resource queue memory allocation, and each of the plurality of slots is associated with a slot memory allocation.
摘要:
Systems and methods for automatically scaling a big data system are disclosed. Methods may include: determining, at a first time, a first optimal number of nodes for a cluster to adequately process a request; assigning an amount of nodes equal to the first optimal number; determining a rate of progress of the request; determining, at a second time based on the rate of progress a second optimal number of nodes; and modifying the number of nodes assigned to the cluster to equal the second optimal number. Systems may include: a cluster manager, to add and/or remove nodes; a big data system, to process requests that utilize the cluster and nodes, and an automatic scaling cluster manager, including: a big data interface, for communicating with the big data system; a cluster manager interface, for communicating with a cluster manager instructions for adding and/or removing nodes from a cluster used to process a request; and a cluster state machine.
摘要:
SQL queries are optimized to operate directly on compressed data (and obtain the correct result) rather than requiring that the data be first decompressed prior to processing a query. Certain characteristic pattern trees are mapped against a logical input query plan that includes certain logical operators such as a DECOMPRESS that precedes a JOIN or a GROUPBY in association with a COUNT to identify instances in the plan that match a characteristic pattern. Upon locating a match, the input query plan is transformed into a logically equivalent plan that operates correctly on compressed data, by analyzing the interplay of the semantics of logical query operations with the compressed data and substituting less costly structures and operations. DECOMPRESS operations are moved to operate subsequent to a JOIN or eliminated altogether, and COUNT operations are replaced by a different operation, such as SUM, that is logically equivalent for compressed data.