摘要:
An extensible identification system for nodes in a hierarchy is described wherein each node is assigned a concatenation of decimal based values. The values assigned uniquely identify the node, provides an order for the node, and identifies its parent, child, and sibling relationships with other nodes Furthermore, the IDs assigned can be encoded to be byte comparable. Furthermore, the ID's assigned to nodes need not be modified when changes (adding/deleting a child node or a subtree of nodes) are made in the hierarchy. Additionally, in the event of such a change, the order and relationships between the parent, child, and sibling nodes are retained.
摘要:
Provided are techniques for processing a query. The query is received, and the query is formed by one or more paths, where each path includes one or more steps. A hierarchical document is received that includes one or more document nodes. While processing the query and traversing the hierarchical document to find document nodes described by at least one of the one or more steps of the query, a match graph is constructed that includes one or more match nodes. Each of the match nodes identifies a step instance and is associated with step instances that are ancestors and descendants of the identified step instance. Also, each of the match nodes is associated with a level. In addition, the match graph includes zero or more edges between the match nodes indicating relationships between the match nodes. The match nodes in the match graph are traversed from lower levels to higher levels to construct results for the query.
摘要:
A system and method for managing and storing logically grouped hierarchical data via physical block storage is provided. Logical groups of parsed XML node data forming node ID ranges are indexed by creating and inserting an index entry into a node ID range. Index entries indicate node ID range bounds for blocks in which nodes are stored. Consulting a node ID range index facilitates XML node traversal via logical links between nodes in different blocks. Additionally, physical links between nodes within a block allow for fast node traversal. Node update including insertion and deletion as well as document order based pre-fetch and XML document re-organization is also facilitated by this architecture.
摘要:
Method for ordering nodes within hierarchical data. The concept of isolated ordered regions to maintain coordinates of nodes is used by associating each node with coordinates relative to a containing region. Modifications to nodes within a region only affect the nodes in that region, and not nodes in other regions. Traversals that retrieve information from the nodes can rebase the coordinates from their containing region and return with a total order.
摘要:
Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.
摘要:
A disclosed method identifies a subject matter expert in an online community for a business entity. The method is performed at a server having one or more processors and memory. The method identifies a plurality of subject matter areas for postings on a website for an online community, and each posting is assigned to one or more of the subject matter areas. For a first subject matter area a set of users is identified, each of whom has authored a plurality of postings assigned to the first subject matter area. A first user of the set of users is identified as a subject matter expert for the first subject matter area based on a determination that the first user's postings assigned to the first subject matter area have consistently been highly relevant to the first subject matter area over a predefined period of time.
摘要:
Method for ordering nodes within hierarchical data. The concept of isolated ordered regions to maintain coordinates of nodes is used by associating each node with coordinates relative to a containing region. Modifications to nodes within a region only affect the nodes in that region, and not nodes in other regions. Traversals that retrieve information from the nodes can rebase the coordinates from their containing region and return with a total order.
摘要:
The concept of isolated ordered regions to maintain coordinates of nodes is used by associating each node with coordinates relative to a containing region. Modifications to nodes within a region only affect the nodes in that region, and not nodes in other regions. Traversals that retrieve information from the nodes can rebase the coordinates from their containing region and return with a total order. Access patterns and usage are used to recognize and prefetch pages. The probability of revisiting traversed nodes are identified and pages in a bufferpool are replaced based upon the identified probabilities (e.g., replacing pages with the least probability of a revisit).
摘要:
A mechanism is described for transient versioning in architectures that manage node ranges, wherein each node is assigned a node ID value and a set of nodes form a range of node IDs called a node range. Each entry in the index describes one range and points to where the range is located. Individual nodes are located by finding the correct range in the index. When nodes are added to or deleted from a node range, the range of nodes are versioned by copying the nodes before changes, to transient storage, and then the original nodes are modified. Different versions are tracked by assigning timestamps to each copy of the node range. Each entry in the node ID range index points to the location of the nodes in a range called the range identifier or RID. Before changes are made in a range, the nodes in a range are copied to a Version Hash Table based on the RID. Copies of the range including the current one is assigned a timestamp or LSN. New readers after a change, access the current nodes through RID, while old readers access the old nodes through the same RID, but hashing it to find the shadowed copy in the Version Hash Table. If changes causes nodes in the range to be moved to a new RID, previous readers need to be redirected from the new RID to the old RID.
摘要:
A method of computing pseudo keys facilitates the bounding of node ID ranges. Pseudo keys are computed to facilitate node location in node ID ranges that have been split. A pseudo previous high key is computed by decrementing the last digit of the lowest node ID value in a newly formed node ID range by one and by appending ‘x’.‘x’. A computed pseudo key has no previous siblings or descendants of previous sibling having a node ID higher in value than a computed pseudo previous high key. Pseudo keys are also computed to define boundaries of a sub-tree. The range determined by a pseudo previous high key for a highest valued root node and a pseudo sub-tree high key bounds a sub-tree. Sub-tree pseudo keys are also comprised of a pseudo sub-tree low key and a pseudo end of document key.