摘要:
A method and apparatus are provided for managing work granules being executed in parallel. A task is evenly divided between a number of work granules. The number of work granules falls between a threshold minimum and a threshold maximum. The threshold minimum and maximum may be configured to balance a variety of efficiency factors affected by the number of work granules, including workload skew and overhead incurred in managing larger number of work granules. Work granules are distributed to processes on nodes according to which of the nodes, if any, may execute the work granule efficiently. A variety of factors may used to determine where a work granule may be performed efficiently, including whether data accessed during the execution of a work granule may be locally accessed by a node.
摘要:
A query coordinator handles a multiple-server dynamic performance query by sending remote query slaves (1) first information for generating a complete plan for the query, and (2) second information for participating in the dynamic performance view portion of the query. If the slaves on the remote servers are unable to use the first information to generate an equivalent query (for example, if they reside in a database server that has closed the database), then the slaves on the remote servers use the second information to participate in the dynamic performance view portion of the query.
摘要:
A task is divided into work granules that require access to data objects. The work granules are distributed to slave processes in a manner that causes the data objects to be accessed in a balanced way, such that the difference in the number of slave processes accessing any object is not greater than one. Distributing the work granules in this manner decreases the likelihood that the resources required to access any particular data object will become a bottleneck in performing the task. For each data object in the set of data objects, a work granule list is maintained. The list of each data object identifies work granules requiring access to the data object. A slave process is assigned a work granule selected from a set of work granule lists. To select a work granule for a slave process, an initial list is picked at random. If the quantity of currently-assigned work granules from the selected work granule list is less than or equals a “threshold minimum”, then a work granule from the work granule list is assigned to the slave process. If the quantity of work granules is greater than the threshold minimum, then another work granule list is selected. The threshold minimum may be, for example, the minimum number of currently-assigned work granules from the work granule list.
摘要:
Techniques are provided for increasing the degree of parallelism without incurring overhead costs associated with inter-nodal communication for performing parallel operations. One aspect of the invention is to distribute-phase partition-pairs of a parallel partition-wise operation on a pair of objects among the nodes of a database system. The -phase partition-pairs that are distributed to each node are further partitioned to form a new set of-phase partition-pairs. One -phase partition-pair from the set of new-phase partition-pairs is assigned to each slave process that is on a given node. In addition, a target object may be partitioned by applying an appropriate hash function to the tuples of the target object. The parallel operation is performed by broadcasting each tuple from a source table only to the group of slave processes that is working on the static partition to which the tuple is mapped.
摘要:
A method and apparatus are provided for managing work granules being executed in parallel. A task is evenly divided between a number of work granules. The number of work granules falls between a threshold minimum and a threshold maximum. The threshold minimum and maximum may be configured to balance a variety of efficiency factors affected by the number of work granules, including workload skew and overhead incurred in managing larger number of work granules. Work granules are distributed to processes on nodes according to which of the nodes, if any, may execute the work granule efficiently. A variety of factors may used to determine where a work granule may be performed efficiently, including whether data accessed during the execution of a work granule may be locally accessed by a node.