摘要:
The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is presented, wherein value prefixes are stored in the dictionary to get good compression from small dictionaries. Also presented is an algorithm that determines the optimal entries for a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispredicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow code processing to be performed with few instructions, no branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the optimal assignment of Huffman codes and show that the adverse effect of quantization on the compression ratio is quite small.
摘要:
The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is presented, wherein value prefixes are stored in the dictionary to get good compression from small dictionaries. Also presented is an algorithm that determines the optimal entries for a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispredicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow code processing to be performed with few instructions, no branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the optimal assignment of Huffman codes and show that the adverse effect of quantization on the compression ratio is quite small.
摘要:
The embodiments of the invention provide a method of intersecting a group of lists. The method begins by performing a first selecting process including selecting a top list from the group of lists to leave remaining lists. The top list can be the smallest list of the group of lists. The method can also select a pair of lists from the group of lists, such that the pair of lists has the smallest intersection size relative to other pairs of lists of the group of lists. Next, the method estimates intersections of the remaining lists with the top list by estimating an amount of intersection between the remaining lists and the top list. This involves sampling a portion of the remaining lists. The method also includes identifying larger list pairs having smaller intersections sizes when compared to smaller list pairs having larger intersections sizes.
摘要:
There is disclosed a data processing system implemented method, a data processing system, and an article of manufacture for directing a data processing system to maintain a database table associated with an initial maintenance scheduling interval. The data processing system implemented method includes selecting a randomizing factor, and selecting a new maintenance scheduling interval for the database table based on the initial maintenance scheduling interval and the selected randomizing factor.
摘要:
A method for approximating a validity range for a domain of cardinalities of input to an optimal query plan is provided. Such a validity range is iteratively approximated using a modified Newton-Raphson method to find roots of cost functions for optimal and alternative query plans, respectively. The Newton-Raphson method is combined with a method of incrementing roots of cost functions, known as input cardinalities, such that discontinuous and non-differentiable points in cost functions are avoided. In this manner, input cardinalities remain within a domain for which a valid range can be specified. Additionally, a robustness measure is determined by a sensitivity analysis performed on an approximated validity range. Using a robustness measure provided by a sensitivity analysis and resultant validity range and, query plan sub-optimality detection is simplified, re-optimization is selectively triggered, and robustness information is provided to a system or user performing corrective actions.
摘要:
A distributed index for discovering distributed data sources and computing resources based on predicates on attributes is provided. Proposed is a non-altruistic scheme for indexing distributed data, in which nodes are provided with incentives to cooperate in the referencing of data and the routing of search requests for indexed data. Indexed data is mapped to a dynamic routing graph, in which nodes earn credits each time they route a search request. Participatory nodes along a search request traversal continually modify local routing decisions in a manner necessary to maximize profit. Thus, routing paths as a whole are able to dynamically adapt to changing query workloads and access patterns. Dynamic adaptation also occurs by automatic load-balancing of recipients of frequently routed searches, known as “hot spots”, for frequently request data, “hot items”, as a result of an incentive to replicate the indexing strategy of a more profitable node.
摘要:
A system, method, and program storage device implementing the method, for integrating data in a database management system, wherein the method comprises grouping data sources and replicas of the data sources that provide analogous data into a common logical domain; writing application queries against the common logical domain; selecting a correct set of replicas of the data sources and a query-execution strategy for combining a content of the correct set of replicas of the data sources in order to answer the application queries according to query-cost-based optimization; selecting a correct set of data sources according to run-time constraints; shielding the application queries from changes to the data sources by dynamically binding the application queries against the correct sets of data sources and replicas of the data sources; and processing the application queries by generating an optimum query result based on the steps of grouping and shielding.
摘要:
A distributed index for discovering distributed data sources and computing resources based on predicates on attributes is provided. Proposed is a non-altruistic scheme for indexing distributed data, in which nodes are provided with incentives to cooperate in the referencing of data and the routing of search requests for indexed data. Indexed data is mapped to a dynamic routing graph, in which nodes earn credits each time they route a search request. Participatory nodes along a search request traversal continually modify local routing decisions in a manner necessary to maximize profit. Thus, routing paths as a whole are able to dynamically adapt to changing query workloads and access patterns. Dynamic adaptation also occurs by automatic load-balancing of recipients of frequently routed searches, known as “hot spots”, for frequently request data, “hot items”, as a result of an incentive to replicate the indexing strategy of a more profitable node.
摘要:
According to one embodiment of the present invention, a method for dictionary encoding data without using three-valued logic is provided. According to one embodiment of the invention, a method includes encoding data in a database table using a dictionary, wherein the data includes values representing NULLs. A query having a predicate is received and the predicate is evaluated on the encoded data, whereby the predicate is evaluated on both the encoded data and on the encoded NULLs.
摘要:
The embodiments of the invention provide a method of ordering an intersecting of a group of lists into a left-deep AND-tree. The method begins by performing a first selecting process including selecting a top list, corresponding to a top leaf of the left-deep AND-tree, from the group of lists to leave remaining lists of the group of lists. The top list can be the smallest list of the group of lists. The method can also select a pair of lists from the group of lists, such that the pair of lists has the smallest intersection size relative to other pairs of lists of the group of lists. Next, the method estimates intersections of the remaining lists with the top list by estimating an amount of intersection between the remaining lists and the top list. This involves sampling a portion of the remaining lists. The method also includes identifying larger list pairs having smaller intersections sizes when compared to smaller list pairs having larger intersections sizes.