摘要:
Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
摘要:
Systems and methods for generating a document identifier, for a document received at a content store, to be used to generate at least one posting list in connection with querying a content store is described.
摘要:
Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
摘要:
Systems and methods for automatically updating queries on a data store, such as a personal content database (PCDB), are provided. A query engine runs queries against two indexes: a first index that represents a previous state of documents and a second index that represents a current state of documents. The query is run twice and a delta analysis is performed, i.e., a determination is made as to which documents have changed in some respect from the previous state to the current state, and a view or a count associated with at least one query changes in accordance with the delta analysis. Transactions may be batched dynamically by a transaction manager until an optimal number of documents have changed or a certain amount of time has passed prior to re-running the query and performing the delta analysis.