COLUMNAR DATA ARRANGEMENT FOR SEMI-STRUCTURED DATA

    公开(公告)号:US20170116273A1

    公开(公告)日:2017-04-27

    申请号:US15078713

    申请日:2016-03-23

    CPC classification number: G06F17/30466 G06F17/30911 G06F17/30917

    Abstract: Techniques are provided for de-normalizing semi-structured hierarchical data into a virtual table. At least a portion of semi-structured data document collection is denormalized for improving the execution of queries that involves a traversal of the semi-structured data hierarchy of the semi-structured data document collection, in an embodiment. Based on the extracted schema of the semi-structured data, a de-normalized arrangement is generated, in which the hierarchical relationship of the semi-structured data is converted into a set of columns. The denormalized arrangement is materialized by applying the de-normalized arrangement onto the semi-structured data. The materialized arrangement, the virtual table, may be stored on a persistent storage or kept in volatile memory. The virtual table may be stored in one format on the persistent storage and in another format in the volatile memory. A received query that involves a traversal of the semi-structured data hierarchy is converted to a relational query that can be executed on the virtual table, in an embodiment. The execution of the relational query on the virtual table improves the performance in generating the resulting data set.

Patent Agency Ranking