摘要:
Methods and apparatus, including computer systems and program products, for executing a query on a subset of data, for example, to facilitate a fast search with a very large result set. In one general aspect, a method of executing a query includes receiving a query for execution on data in the data repository; generating an estimate of a number of results of the query; defining a subset of data in the data repository; determining whether to execute the query on the subset of the data; executing the query on the subset of the data to generate a partial set of results if the query is to be executed on the subset of the data, otherwise executing the query on the data repository to generate a complete set of results; and providing query results.
摘要:
Systems and methods are provided for efficient calculation of sets of distinct results in an information retrieval service. A query is received having at least one requested attribute and one or more conditions. For each row identifier in a database table that matches the one or more conditions, a tuple of value identifiers having an entry for each requested attribute is calculated. A unique number is generated and assigned to the tuple for each distinct combination of the value identifiers. Duplicate entries in the tuple listing are identified and removed, so that a result set provides only distinct results.
摘要:
A method of executing a distributed join query for a set of documents includes communication between a first server and a second server. In the first server, a first tuple list is generated from a first list of documents matching a precondition part of the query. A first set of value identifiers of attributes associated with the first list of documents is extracted from the first tuple list. A first set of dictionary keys is generated from the set of value identifiers. Then, the first set of dictionary keys is sent with a join condition attribute to a second server. In the second server, the first set of value identifiers is converted to a second set of value identifiers of attributes associated with the second server based on the set of dictionary keys. Then, a lookup of documents is performed based on the second set of value identifiers.
摘要:
Methods and apparatus, including computer systems and program products, for executing a query on a subset of data, for example, to facilitate a fast search with a very large result set. In one general aspect, a method of executing a query includes receiving a query for execution on data in the data repository; generating an estimate of a number of results of the query; defining a subset of data in the data repository; determining whether to execute the query on the subset of the data; executing the query on the subset of the data to generate a partial set of results if the query is to be executed on the subset of the data, otherwise executing the query on the data repository to generate a complete set of results; and providing query results.