摘要:
A system and method for querying a stream of XML data in a single pass using standard XQuery expressions. The system comprises: an expression parser that receives a query and generates a parse tree; a SAX events API that receives the stream of XML data and generates a stream of SAX events; an evaluator that receives the parse tree and stream of SAX events and buffers fragments from the stream of SAX events that meet an evaluation criteria; and a tuple constructor that joins fragments to form a set of tuple results that satisfies the query for the stream of XML data.
摘要:
A system and method for querying a stream of XML data in a single pass using standard XQuery expressions. The system comprises: an expression parser that receives a query and generates a parse tree; a SAX events API that receives the stream of XML data and generates a stream of SAX events; an evaluator that receives the parse tree and stream of SAX events and buffers fragments from the stream of SAX events that meet an evaluation criteria; and a tuple constructor that joins fragments to form a set of tuple results that satisfies the query for the stream of XML data.
摘要:
Provided are a method, system, and article of manufacture for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents including at least one value within the range of consecutive values associated with the posting list, and wherein each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored, wherein the posting lists are used to process a query on a range of values within the set of values.
摘要:
A holistic twig join method with optimal cursor movement is disclosed. The method in one aspect minimizes the number of cursor moves by looking more globally at the query's state to determine which cursor to move next and making virtual moves where a physical move is not needed. The method in another aspect reduces the number of cursor moves by skipping over nodes that do not need to be output.
摘要:
A holistic twig join method with optimal cursor movement is disclosed. The method in one aspect minimizes the number of cursor moves by looking more globally at the query's state to determine which cursor to move next and making virtual moves where a physical move is not needed. The method in another aspect reduces the number of cursor moves by skipping over nodes that do not need to be output.
摘要:
Disclosed is a technique for building an index in which global analysis computations and index creation are pipelined, wherein the global analysis computations share intermediate results.
摘要:
Disclosed is a method, system, and program for handling redirects in documents. At least one equivalence class that includes documents that are connected through a redirect. Cycles for each equivalence class are detected, wherein documents in a cycle are marked so that they are not indexed. Incomplete chains for each equivalence class are detected, wherein documents in an incomplete chain are marked so that they are not indexed. A representative for each equivalence class is selected.
摘要:
Disclosed is a technique for indexing data. A token is received. It is determined whether a data field associated with the token is a fixed width. When the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed. When the data field is a variable length, the token is designated as one for which a variable width sort is to be performed.