摘要:
Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.
摘要:
Disclosed is a technique for building an index in which global analysis computations and index creation are pipelined, wherein the global analysis computations share intermediate results.
摘要:
Disclosed is a technique for building an index. A new indexi+1 is built and an anchor text tablei+1 and a duplicates tableti+1 are output using a storesi, a delta store, and previously generated global analysis computationsi, wherein the previously generated global analysis computationsi include an anchor text tablei, a rank tablei, and a duplicates tablei. New global analysis computationsi+1 are generated using the anchor text tablei+1, the duplicates tablei+1, and the previously generated global analysis computationsi.
摘要:
Disclosed is a technique for indexing data. A token is received. It is determined whether a data field associated with the token is a fixed width. When the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed. When the data field is a variable length, the token is designated as one for which a variable width sort is to be performed.
摘要:
Provided are techniques for computer-based electronic Information Retrieval (IR). An extended inverted index structure by generating one or more lexical affinities (LA), wherein each of the one or more lexical affinities comprises two or more search items found in proximity in one or more documents in a pool of documents, and generating a posting list for each of the one or more lexical affinities, wherein each posting list is associated with a specific lexical affinity and contains document identifying information for each of the one or more documents in the pool that contains the specific lexical affinity and a location within the document where the specific lexical affinity occurs.
摘要:
Provided are a method, system, and program for searching documents for ranges of numeric values. Document identifiers for documents are accessed, wherein the documents include at least one value that is a member of a set of values. A number of posting lists are generated. Each posting list is associated with a range of consecutive values within the set of values and includes document identifiers for documents having values within the range of consecutive values associated with the posting list. Each document identifier is associated with one value in the set of values included in the document identified by the document identifier. The generated posting lists are stored.
摘要:
An improved system and method for efficiently evaluating complex Boolean expressions is provided. Leaf nodes of Boolean expression trees for objects represented by Boolean expressions of attribute-value pairs may be assigned a positional identifier that indicates the position of a node in the Boolean expression tree. The positional identifiers of each object may be indexed by attribute-value pairs of the leaf nodes of the Boolean expression trees in an inverted index. Given an input set of attribute-value pairs, a list of positional identifiers for leaf nodes of virtual Boolean expression trees may be found in the index matching the attribute-value pairs of the input set. The list of positional identifiers of leaf nodes may be sorted in order by positional identifier for each contract. An expression evaluator may then verify whether a virtual Boolean expression tree for each contract is satisfied by the list of positional identifiers.
摘要:
An item of inventory is described as a Boolean expression, which is converted into a multi-level, alternating AND/OR impression tree representation with leaf nodes representing conjuncts. Processing the conjuncts of the tree through a contract index results in retrieving a set of candidate contracts that match at least some but not necessarily all impression tree leaf node predicates. Next, an AND/OR contract tree representation is constructed with each contract tree leaf node having a label representing a projection onto a discrete set of ordered symbols. Contracts with projections that cover the entire range of discrete set of ordered symbols are deemed to satisfy the item of inventory. Implementation of the contract index includes retrieval techniques to support multi-valued predicates as well as confidence threshold functions using a multi-level tree representation of multi-valued predicates.
摘要:
A method for automatic matching of contracts to inventory using a fixed-length complex predicate representation. An item of inventory is described as a Boolean expression, which is converted into a multi-level, alternating AND/OR impression tree representation with leaf nodes representing conjuncts. Processing the conjuncts of the tree through a contract index results in retrieving a set of candidate contracts that match the at least some but not necessarily all impression tree leaf node predicates. Next, an AND/OR contract tree representation is constructed with each contract tree leaf node having a label representing a projection onto a discrete set of ordered symbols. Contracts with projections that cover the entire range of discrete set of ordered symbols are deemed to satisfy the item of inventory. Implementation of the contract index includes retrieval techniques to support multi-valued predicates as well as confidence threshold functions using a multi-level tree representation of multi-valued predicates.
摘要:
A method for automatic matching of contracts to inventory using a fixed-length complex predicate representation. An item of inventory is described as a Boolean expression, which is converted into a multi-level, alternating AND/OR impression tree representation with leaf nodes representing conjuncts. Processing the conjuncts of the tree through a contract index results in retrieving a set of candidate contracts that match the at least some but not necessarily all impression tree leaf node predicates. Next, an AND/OR contract tree representation is constructed with each contract tree leaf node having a label representing a projection onto a discrete set of ordered symbols. Contracts with projections that cover the entire range of discrete set of ordered symbols are deemed to satisfy the item of inventory. Implementation of the contract index includes retrieval techniques to support multi-valued predicates as well as confidence threshold functions using a multi-level tree representation of multi-valued predicates.