Columnar data arrangement for semi-structured data

    公开(公告)号:US10191944B2

    公开(公告)日:2019-01-29

    申请号:US15078713

    申请日:2016-03-23

    Abstract: Techniques are provided for de-normalizing semi-structured hierarchical data into a virtual table. At least a portion of semi-structured data document collection is denormalized for improving the execution of queries that involves a traversal of the semi-structured data hierarchy of the semi-structured data document collection, in an embodiment. Based on the extracted schema of the semi-structured data, a de-normalized arrangement is generated, in which the hierarchical relationship of the semi-structured data is converted into a set of columns. The denormalized arrangement is materialized by applying the de-normalized arrangement onto the semi-structured data. The materialized arrangement, the virtual table, may be stored on a persistent storage or kept in volatile memory. The virtual table may be stored in one format on the persistent storage and in another format in the volatile memory. A received query that involves a traversal of the semi-structured data hierarchy is converted to a relational query that can be executed on the virtual table, in an embodiment. The execution of the relational query on the virtual table improves the performance in generating the resulting data set.

    Generic Indexing for Efficiently Supporting Ad-Hoc Query Over Hierarchically Marked-Up Data
    2.
    发明申请
    Generic Indexing for Efficiently Supporting Ad-Hoc Query Over Hierarchically Marked-Up Data 有权
    通过索引有效地支持分层标记数据的查询

    公开(公告)号:US20150134670A1

    公开(公告)日:2015-05-14

    申请号:US14498893

    申请日:2014-09-26

    CPC classification number: G06F17/30327 G06F17/30286

    Abstract: Hierarchical data objects are indexed using an index referred to herein as a hierarchy-value index. A hierarchy-value index has, as index keys, tokens (tag name, a word in node string value) that are extracted from hierarchical data objects. Each token is mapped to the locations that correspond to the data for the token in hierarchical data objects. A token can represent a non-leaf node, such as an XML element or a JSON field. A location can be a region covering and subsuming child nodes. For a token that represents a non-leaf node, a location to which the token is mapped contains the location of any token corresponding to a descendant node of the non-leaf node. Thus, token containment based on the locations of tokens within a hierarchical data object may be used to determine containment relationships between nodes in a hierarchical data object.

    Abstract translation: 使用本文中称为层次值索引的索引对分层数据对象进行索引。 层次值索引具有从分层数据对象中提取的令牌(标记名称,节点字符串值中的字)的索引关键字。 每个令牌映射到与分层数据对象中令牌的数据相对应的位置。 令牌可以表示非叶节点,例如XML元素或JSON字段。 位置可以是覆盖和包含子节点的区域。 对于表示非叶节点的令牌,令牌映射的位置包含与非叶节点的后代节点对应的任何令牌的位置。 因此,可以使用基于分级数据对象内的令牌的位置的令牌容纳来确定分层数据对象中的节点之间的包含关系。

    Techniques related to binary encoding of hierarchical data objects to support efficient path navigation of the hierarchical data objects

    公开(公告)号:US10262012B2

    公开(公告)日:2019-04-16

    申请号:US14836680

    申请日:2015-08-26

    Abstract: Techniques related to binary encoding of hierarchical data objects to support efficient path navigation of the hierarchical data objects are disclosed. A hierarchical data object may include field names that are associated with field values. A method may involve generating a plurality of hash codes, each hash code corresponding to a respective field name. The method may involve generating a hash-code mapping that maps each hash code to a respective field-name identifier. The method may involve generating a field-name mapping that maps each field name to a respective field-name identifier. The method may involve generating a hierarchical tree of nodes that includes non-leaf nodes and leaf nodes. A particular non-leaf node may include a child node mapping that maps the particular non-leaf node to one or more child nodes and may include a field-name-identifier-to-child mapping that maps a respective field-name identifier to each of the one or more child nodes.

    Generic indexing for efficiently supporting ad-hoc query over hierarchically marked-up data

    公开(公告)号:US09659045B2

    公开(公告)日:2017-05-23

    申请号:US14498893

    申请日:2014-09-26

    CPC classification number: G06F17/30327 G06F17/30286

    Abstract: Hierarchical data objects are indexed using an index referred to herein as a hierarchy-value index. A hierarchy-value index has, as index keys, tokens (tag name, a word in node string value) that are extracted from hierarchical data objects. Each token is mapped to the locations that correspond to the data for the token in hierarchical data objects. A token can represent a non-leaf node, such as an XML element or a JSON field. A location can be a region covering and subsuming child nodes. For a token that represents a non-leaf node, a location to which the token is mapped contains the location of any token corresponding to a descendant node of the non-leaf node. Thus, token containment based on the locations of tokens within a hierarchical data object may be used to determine containment relationships between nodes in a hierarchical data object.

    Efficiently registering a relational schema

    公开(公告)号:US09330124B2

    公开(公告)日:2016-05-03

    申请号:US14044982

    申请日:2013-10-03

    CPC classification number: G06F17/30312 G06F17/30595

    Abstract: A method, device, and non-transitory computer-readable storage medium are provided for efficiently registering a relational schema. In co-compilation and data guide approaches, a subset of entities from schema descriptions are selected for physical registration, and other entities from the schema descriptions are not physically registered. In the co-compilation approach, a first schema description references a second schema description, and the subset includes a set of entities from the second schema description that are used by the first schema description. In the data guide approach, the subset includes entities that are used by a set of structured documents. In a pay-as-you-go approach, schema registration includes logically registering entities without creating relational database structures corresponding to the entities. A database server may execute database commands that reference the logically registered entities. A request to store data for the entities may be executed by creating relational database structures to store the data.

Patent Agency Ranking