COLUMNAR DATA ARRANGEMENT FOR SEMI-STRUCTURED DATA

    公开(公告)号:EP3365804A1

    公开(公告)日:2018-08-29

    申请号:EP16787700.0

    申请日:2016-10-19

    IPC分类号: G06F17/30

    摘要: Techniques are provided for de-normalizing semi-structured hierarchical data into a virtual table. At least a portion of semi-structured data document collection is denormalized for improving the execution of queries that involves a traversal of the semi-structured data hierarchy of the semi-structured data document collection, in an embodiment. Based on the extracted schema of the semi-structured data, a de-normalized arrangement is generated, in which the hierarchical relationship of the semi-structured data is converted into a set of columns. The denormalized arrangement is materialized by applying the de-normalized arrangement onto the semi-structured data. The materialized arrangement, the virtual table, may be stored on a persistent storage or kept in volatile memory. The virtual table may be stored in one format on the persistent storage and in another format in the volatile memory. A received query that involves a traversal of the semi-structured data hierarchy is converted to a relational query that can be executed on the virtual table, in an embodiment. The execution of the relational query on the virtual table improves the performance in generating the resulting data set.

    GENERIC INDEXING FOR EFFICIENTLY SUPPORTING AD-HOC QUERY OVER HIERARCHICALLY MARKED-UP DATA
    6.
    发明公开
    GENERIC INDEXING FOR EFFICIENTLY SUPPORTING AD-HOC QUERY OVER HIERARCHICALLY MARKED-UP DATA 审中-公开
    有效支持GENERIC分度即席查询关于分级标记的数据

    公开(公告)号:EP3066585A1

    公开(公告)日:2016-09-14

    申请号:EP14803297.2

    申请日:2014-11-06

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30327 G06F17/30286

    摘要: Hierarchical data objects are indexed using an index referred to herein as a hierarchy-value index. A hierarchy-value index has, as index keys, tokens (tag name, a word in node string value) that are extracted from hierarchical data objects. Each token is mapped to the locations that correspond to the data for the token in hierarchical data objects. A token can represent a non-leaf node, such as an XML element or a JSON field. A location can be a region covering and subsuming child nodes. For a token that represents a non-leaf node, a location to which the token is mapped contains the location of any token corresponding to a descendant node of the non-leaf node. Thus, token containment based on the locations of tokens within a hierarchical data object may be used to determine containment relationships between nodes in a hierarchical data object.

    摘要翻译: 分层数据对象是使用由作为层次值索引进入称为索引索引。 层次结构值指数,作为索引关键字,令牌(标签名,在节点字符串值的单词)都从阶层数据对象提取。 每个令牌被映射到的位置也对应于用于在分层数据对象的令牌中的数据。 记号可以表示非叶节点,作为寻求XML元素或JSON字段。 位置可以是覆盖并归并的childNodes的区域。 对于令牌做darstellt非叶节点,对此令牌所映射到的位置包含任何令牌对应于非叶子节点的子孙节点的位置。 因此,基于分层数据对象内的令牌的位置令牌安全壳可被用于在一个分层数据对象节点之间的确定性矿包含关系。