MULTIPLE QUERY OPTIMIZATION IN SQL-ON-HADOOP SYSTEMS

    公开(公告)号:US20170316055A1

    公开(公告)日:2017-11-02

    申请号:US15523729

    申请日:2014-12-01

    Abstract: To reduce the overall computation time of a batch of queries, multiple query optimization in SQL-on-Hadoop systems groups multiple MapReduce jobs converted from queries into a single one, thus avoiding redundant computations by taking sharing opportunities of data scan, map function and map output. SQL-on-Hadoop converts a query into a DAG of MapReduce jobs and each map function is a part of query plan composed of a sequence of relational operators. As each map function is a part of query plan which is usually complex and heavy, disclosed method creates a cost model to simulate the computation time which takes both I/O cost for reading/writing input file and intermediate data and CPU cost for the computation of map function into consideration. A heuristic algorithm is disclosed to find near-optimal integrated query plan for each group based on an observation that each query plan is locally optimal.

    A METHOD FOR EFFICIENT ONE-TO-ONE JOIN
    2.
    发明申请

    公开(公告)号:US20170308578A1

    公开(公告)日:2017-10-26

    申请号:US15507317

    申请日:2014-09-09

    CPC classification number: G06F16/2456 G06F16/284

    Abstract: One-to-one join is widely used in machine learning and business intelligent applications. Disclosed herein is an efficient method for one-to-one join to reduce memory usage and thus disk I/O accesses with limited memory. Disclosed method outputs and removes a pair of tuples immediately when they are matched to each other to allow join results to be generated without reading entire tables. Meanwhile, disclosed method increases matching rate for in memory blocks through the predication of data distribution patterns based on both statistics and history block matching information.

Patent Agency Ranking