Scheduled network communication for efficient re-partitioning of data
Abstract:
A method, apparatus, and system for efficiently re-partitioning data using scheduled network communication are provided. Given re-partitioning data defining the data blocks to be sent amongst a plurality of server nodes, a corresponding network schedule is determined to send the data blocks in a coordinated manner. The network schedule is divided into time slots, wherein each of the plurality of server nodes can send up to one data block and receive up to one data block in each time slot. By using a greedy selection algorithm that prioritizes by largest senders and largest receivers, a near optimal schedule can be determined even in the presence of heavy skew. The greedy selection algorithm can be implemented with a O(T*N^2) time complexity, enabling scaling to large multi-node clusters with many server nodes. The network schedule is of particular interest for database execution plans requiring re-partitioning on operators with different keys.
Information query
Patent Agency Ranking
0/0