摘要:
Type annotation record information storage for annotated automaton encoding for high-performance XML schema validation is optimized in a space efficient aspect. Subsequent to type annotation record information organization, type annotation records are used for type annotation of validated XML documents, either by implementing annotation records and type annotation part of an algorithm only, or by skipping one or more validation steps in a full validation implementation. Given a schema context, a type annotation may be performed for a validated XML fragment as opposed to an entire document. In addition, default features such as attribute and type are supported.
摘要:
A method and system for Extensible Markup Language (XML) schema validation, includes: loading an XML document into a runtime validation engine, where the runtime validation engine includes an XML schema validation parser; loading an annotated automaton encoding (AAE) for an XML schema definition into the XML schema validation parser; and validating the XML document against the XML schema definition by the XML schema validation parser utilizing the annotated automaton encoding. Each XML schema definition is compiled once into the AAE format, rather than being compiled each time an XML document is validated, and thus significant time is saved. The code for the runtime validation engine is fixed and does not vary depending on the XML schema definition, rather than varying for each XML schema definition, and thus space overhead is minimized. Flexibility in the validation process is provided without compromising performance.
摘要:
A storage of nodes of hierarchically structured data uses logical node identifiers to reference the nodes stored within and across record data structures. A node identifier index is used to map each logical node identifier to a record identifier for the record that contains the node. When a sub-tree is stored in a separate record, a proxy node is used to represent the sub-tree in the parent record. The mapping in the node identifier index reflects the storage of the sub-tree nodes in the separate record. Since the references between the records are through logical node identifiers, there is no limitation to the moving of records across pages, as long as the indices are updated or rebuilt to maintain synchronization with the resulting data pages. This approach is highly scalable and has a much smaller storage consumption than approaches that use explicit references between nodes.
摘要:
A method generates hierarchical path index keys for single and multiple indexes with one scan of a document. Each data node of the document is scanned and matches to query nodes are identified. A data node matches a query node if the three conditions hold: if it is not the root step, there is a match for the query node in the previous step of the query; the data node matches the query node of the current step; and the edges of the data and query nodes match. A sub-tree of a data node can be skipped if the data node is not matched and its level is less than the fixed levels of the query. The matched data node is then placed in the match stacks corresponding to the match query nodes. The method uses transitivity properties among matching units to reduce the number of states that need to be tracked and to improve the evaluation of path expressions significantly.
摘要:
A method and system for providing a scalable storage scheme for native hierarchically structured data of relational tables, includes a base table with indicator columns with information pertaining to hierarchically structured data of a document, data tables for storing the hierarchically structured data corresponding to the indicator columns, and node identifier indexes corresponding to the data tables for mapping between the indicator columns and the hierarchically structured data in the data tables. In an embodiment, actual data for each hierarchically structured data (such as XML) column is stored in a separate data table, and each data table has a separate node identifier index. The node identifier index is searched with a key containing the document identifier and a logical node identifier is used, and a record identifier of a record in the data table containing the node assigned the logical node identifier is retrieved.
摘要:
A method, apparatus, and article of manufacture for optimizing a query in a computer system, wherein the query is performed by the computer system to retrieve data from a database stored on the computer system. The optimization includes: (a) combining join predicates from a query with local predicates from each branch of one or more UNION ALL views referenced by the query; (b) analyzing the combined predicates; and (c) not generating the join when the analysis step indicates that the combined predicates lead to an empty result.
摘要:
Disclosed is a method, system, and program for performing a join operation on a multi-column table and at least two satellite tables having a join condition. Each satellite table is comprised of multiple rows and at least one join column. The multi-column table is comprised of multiple rows and at least one column corresponding to the join column in each satellite table. A join operation is performed on the rows of the satellite tables to generate concatenated rows of the satellite tables. One of the concatenated rows is joined to the multi-column table and a returned entry from the multi-column table is received. A determination is then made as to whether the returned entry matches the search criteria. If so, a determination is made as to whether one of the satellite tables has duplicates of values in the join column of the returned matching entry or the multi-column table has duplicate entries in the join columns. Returned matching entries are generated for each duplicate value in the satellite tables and duplicate entry in the multi-column table.
摘要:
A method for conversion between a decimal floating-point number and an order-preserving format has been disclosed. The method encodes numbers in the decimal floating-point format into a format which preserves value ordering. This encoding allows for fast and direct string comparison of two values. Such an encoding provides normalized representations for decimal floating-point numbers and supports type-insensitive comparisons. Type-insensitive comparisons are often used in database management systems, where the data type is not specified for values to compare. In addition, the original decimal floating-point format can be recovered from the order-preserving format.
摘要:
A method, apparatus, and article of manufacture for optimizing a query in a computer system. During compilation of the query, a GROUP BY clause with one or more GROUPING SETS, ROLLUP or CUBE operations is maintained in its original form until after query rewrite. The GROUP BY clause with the GROUPING SETS, ROLLUP or CUBE operations is then translated into a plurality of levels having one or more grouping sets. After compilation of the query, a grouping sets sequence is dynamically determined for the GROUP BY clause with the GROUPING SETS, ROLLUP or CUBE operations based on intermediate grouping sets, in order to optimize the grouping sets sequence. The execution of the grouping sets sequence is optimized by selecting a smallest grouping set from a previous one of the levels as an input to a grouping set on a next one of the levels. Finally, a UNION ALL operation is performed on the grouping sets.
摘要:
Disclosed is method for processing an aggregate function. Rows that contain a reference to intermediate result structures are grouped to form groups. For each group, aggregate element structures are formed from the intermediate result structures and, if the aggregate function specifies ordering, the aggregate element structures are sorted based on a sort key.