Method for compressed data with reduced dictionary sizes by coding value prefixes
    61.
    发明授权
    Method for compressed data with reduced dictionary sizes by coding value prefixes 有权
    通过编码值前缀减少字典大小的压缩数据的方法

    公开(公告)号:US07609179B2

    公开(公告)日:2009-10-27

    申请号:US11970844

    申请日:2008-01-08

    IPC分类号: H03M7/38

    CPC分类号: H03M7/40

    摘要: The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is presented, wherein value prefixes are stored in the dictionary to get good compression from small dictionaries. Also presented is an algorithm that determines the optimal entries for a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispredicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow code processing to be performed with few instructions, no branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the optimal assignment of Huffman codes and show that the adverse effect of quantization on the compression ratio is quite small.

    摘要翻译: 基于字典的解压缩的速度受到字典中访问随机值的成本的限制。 如果字典的大小可以受到限制,因此它适合缓存,解压缩被限制为CPU限制而不是内存限制。 为了实现这一点,提出了一种值前缀编码方案,其中值前缀存储在字典中以从小字典获得良好的压缩。 还提出了一种确定值前缀字典的最优条目的算法。 一旦字典适合高速缓存,解压缩速度通常受到霍夫曼代码处理期间错误分支的成本的限制。 提出了一种新颖的方法来量化霍夫曼代码长度,以允许使用少量指令执行代码处理,无分支和非常少的额外内存。 还提出了一种用于码长量化的算法,其产生了霍夫曼码的最优分配,并且示出了量化对压缩比的不利影响是非常小的。

    Method for Compressed Data with Reduced Dictionary Sizes by Coding Value Prefixes
    62.
    发明申请
    Method for Compressed Data with Reduced Dictionary Sizes by Coding Value Prefixes 有权
    通过编码值前缀减少字典大小的压缩数据的方法

    公开(公告)号:US20090174583A1

    公开(公告)日:2009-07-09

    申请号:US11970844

    申请日:2008-01-08

    IPC分类号: H03M7/40

    CPC分类号: H03M7/40

    摘要: The speed of dictionary based decompression is limited by the cost of accessing random values in the dictionary. If the size of the dictionary can be limited so it fits into cache, decompression is made to be CPU bound rather than memory bound. To achieve this, a value prefix coding scheme is presented, wherein value prefixes are stored in the dictionary to get good compression from small dictionaries. Also presented is an algorithm that determines the optimal entries for a value prefix dictionary. Once the dictionary fits in cache, decompression speed is often limited by the cost of mispredicted branches during Huffman code processing. A novel way is presented to quantize Huffman code lengths to allow code processing to be performed with few instructions, no branches, and very little extra memory. Also presented is an algorithm for code length quantization that produces the optimal assignment of Huffman codes and show that the adverse effect of quantization on the compression ratio is quite small.

    摘要翻译: 基于字典的解压缩的速度受到字典中访问随机值的成本的限制。 如果字典的大小可以受到限制,因此它适合缓存,解压缩被限制为CPU限制而不是内存限制。 为了实现这一点,提出了一种值前缀编码方案,其中值前缀存储在字典中以从小字典获得良好的压缩。 还提出了一种确定值前缀字典的最优条目的算法。 一旦字典适合高速缓存,解压缩速度通常受到霍夫曼代码处理期间错误分支的成本的限制。 提出了一种新颖的方法来量化霍夫曼代码长度,以允许使用少量指令执行代码处理,无分支和非常少的额外内存。 还提出了一种用于码长量化的算法,其产生了霍夫曼码的最优分配,并且示出了量化对压缩比的不利影响是非常小的。

    ADAPTIVE GREEDY METHOD FOR FAST LIST INTERSECTION VIA SAMPLING
    63.
    发明申请
    ADAPTIVE GREEDY METHOD FOR FAST LIST INTERSECTION VIA SAMPLING 有权
    用于通过采样快速列表交互的自适应GREEDY方法

    公开(公告)号:US20090113309A1

    公开(公告)日:2009-04-30

    申请号:US11923684

    申请日:2007-10-25

    IPC分类号: G06F3/00

    CPC分类号: G06F17/30498 G06F17/30592

    摘要: The embodiments of the invention provide a method of intersecting a group of lists. The method begins by performing a first selecting process including selecting a top list from the group of lists to leave remaining lists. The top list can be the smallest list of the group of lists. The method can also select a pair of lists from the group of lists, such that the pair of lists has the smallest intersection size relative to other pairs of lists of the group of lists. Next, the method estimates intersections of the remaining lists with the top list by estimating an amount of intersection between the remaining lists and the top list. This involves sampling a portion of the remaining lists. The method also includes identifying larger list pairs having smaller intersections sizes when compared to smaller list pairs having larger intersections sizes.

    摘要翻译: 本发明的实施例提供了与一组列表相交的方法。 该方法开始于执行第一选择过程,包括从列表组中选择顶部列表以留下剩余的列表。 顶部列表可以是列表组中最小的列表。 该方法还可以从列表组中选择一对列表,使得该对列表相对于列表组的其他列表对具有最小的相交大小。 接下来,该方法通过估计剩余列表和顶部列表之间的交集量来估计剩余列表与顶部列表的交集。 这涉及对剩余列表的一部分进行抽样。 当与具有较大交叉点尺寸的较小列表对相比较时,该方法还包括识别具有较小交点尺寸的较大列表对。

    DETERMINING VALIDITY RANGES OF QUERY PLANS BASED ON SUBOPTIMALITY
    65.
    发明申请
    DETERMINING VALIDITY RANGES OF QUERY PLANS BASED ON SUBOPTIMALITY 有权
    基于不合理性确定查询计划的有效范围

    公开(公告)号:US20080215531A1

    公开(公告)日:2008-09-04

    申请号:US12028120

    申请日:2008-02-08

    IPC分类号: G06F17/30

    摘要: A method for approximating a validity range for a domain of cardinalities of input to an optimal query plan is provided. Such a validity range is iteratively approximated using a modified Newton-Raphson method to find roots of cost functions for optimal and alternative query plans, respectively. The Newton-Raphson method is combined with a method of incrementing roots of cost functions, known as input cardinalities, such that discontinuous and non-differentiable points in cost functions are avoided. In this manner, input cardinalities remain within a domain for which a valid range can be specified. Additionally, a robustness measure is determined by a sensitivity analysis performed on an approximated validity range. Using a robustness measure provided by a sensitivity analysis and resultant validity range and, query plan sub-optimality detection is simplified, re-optimization is selectively triggered, and robustness information is provided to a system or user performing corrective actions.

    摘要翻译: 提供了一种用于近似输入到最优查询计划的基数域的有效范围的方法。 这种有效范围使用修正的牛顿 - 拉夫逊方法迭代近似,以分别找到最优和替代查询计划的成本函数的根。 Newton-Raphson方法与增加成本函数的根的方法相结合,称为输入基数,从而避免了成本函数中的不连续和不可微性的点。 以这种方式,输入基数保持在可以指定有效范围的域内。 另外,通过对近似有效范围进行的灵敏度分析来确定鲁棒性度量。 使用由灵敏度分析和合成有效范围提供的鲁棒性度量,并且简化了查询计划亚最优检测,重新优化被选择性地触发,并且向执行校正动作的系统或用户提供鲁棒性信息。

    Microeconomic mechanism for distributed indexing
    66.
    发明授权
    Microeconomic mechanism for distributed indexing 有权
    分布式索引的微观经济机制

    公开(公告)号:US07340453B2

    公开(公告)日:2008-03-04

    申请号:US10902570

    申请日:2004-07-30

    IPC分类号: G06F17/30

    摘要: A distributed index for discovering distributed data sources and computing resources based on predicates on attributes is provided. Proposed is a non-altruistic scheme for indexing distributed data, in which nodes are provided with incentives to cooperate in the referencing of data and the routing of search requests for indexed data. Indexed data is mapped to a dynamic routing graph, in which nodes earn credits each time they route a search request. Participatory nodes along a search request traversal continually modify local routing decisions in a manner necessary to maximize profit. Thus, routing paths as a whole are able to dynamically adapt to changing query workloads and access patterns. Dynamic adaptation also occurs by automatic load-balancing of recipients of frequently routed searches, known as “hot spots”, for frequently request data, “hot items”, as a result of an incentive to replicate the indexing strategy of a more profitable node.

    摘要翻译: 提供了一种基于属性谓词发现分布式数据源和计算资源的分布式索引。 提出了一种用于索引分布式数据的非利他方案,其中节点被提供有助于参考数据的协调和索引数据的搜索请求的路由。 索引数据被映射到动态路由图,其中节点每当路由搜索请求时获得积分。 沿着搜索请求遍历的参与节点以以最大化利润所必需的方式不断地修改本地路由决策。 因此,路由路径作为一个整体能够动态地适应变化的查询工作负载和访问模式。 通过自动负载平衡频繁路由搜索(称为“热点”)的频繁请求数据“热点”的动态适应也是由于激励更复杂的节点的索引策略的激励。

    Dynamic and selective data source binding through a metawrapper
    67.
    发明授权
    Dynamic and selective data source binding through a metawrapper 失效
    动态和选择性的数据源绑定通过metawrapper

    公开(公告)号:US07315872B2

    公开(公告)日:2008-01-01

    申请号:US10931002

    申请日:2004-08-31

    IPC分类号: G06F17/00

    摘要: A system, method, and program storage device implementing the method, for integrating data in a database management system, wherein the method comprises grouping data sources and replicas of the data sources that provide analogous data into a common logical domain; writing application queries against the common logical domain; selecting a correct set of replicas of the data sources and a query-execution strategy for combining a content of the correct set of replicas of the data sources in order to answer the application queries according to query-cost-based optimization; selecting a correct set of data sources according to run-time constraints; shielding the application queries from changes to the data sources by dynamically binding the application queries against the correct sets of data sources and replicas of the data sources; and processing the application queries by generating an optimum query result based on the steps of grouping and shielding.

    摘要翻译: 实现该方法的系统,方法和程序存储设备,用于将数据集成在数据库管理系统中,其中该方法包括将提供类似数据的数据源的数据源和副本分组到公共逻辑域中; 针对公共逻辑域编写应用程序查询; 选择正确的数据源副本和用于组合数据源的正确的副本集合的内容的查询执行策略,以便根据基于查询成本的优化来应答应用查询; 根据运行时限制选择正确的数据源集合; 通过将应用程序查询与数据源的正确数据源和副本集合动态绑定来屏蔽应用程序查询对数据源的更改; 并通过基于分组和屏蔽的步骤产生最佳查询结果来处理应用查询。

    Microeconomic mechanism for distributed indexing
    68.
    发明申请
    Microeconomic mechanism for distributed indexing 有权
    分布式索引的微观经济机制

    公开(公告)号:US20060026117A1

    公开(公告)日:2006-02-02

    申请号:US10902570

    申请日:2004-07-30

    IPC分类号: G06F17/30

    摘要: A distributed index for discovering distributed data sources and computing resources based on predicates on attributes is provided. Proposed is a non-altruistic scheme for indexing distributed data, in which nodes are provided with incentives to cooperate in the referencing of data and the routing of search requests for indexed data. Indexed data is mapped to a dynamic routing graph, in which nodes earn credits each time they route a search request. Participatory nodes along a search request traversal continually modify local routing decisions in a manner necessary to maximize profit. Thus, routing paths as a whole are able to dynamically adapt to changing query workloads and access patterns. Dynamic adaptation also occurs by automatic load-balancing of recipients of frequently routed searches, known as “hot spots”, for frequently request data, “hot items”, as a result of an incentive to replicate the indexing strategy of a more profitable node.

    摘要翻译: 提供了一种基于属性谓词发现分布式数据源和计算资源的分布式索引。 提出了一种用于索引分布式数据的非利他方案,其中节点被提供有助于参考数据的协调和索引数据的搜索请求的路由。 索引数据被映射到动态路由图,其中节点每当路由搜索请求时获得积分。 沿着搜索请求遍历的参与节点以以最大化利润所必需的方式不断地修改本地路由决策。 因此,路由路径作为一个整体能够动态地适应变化的查询工作负载和访问模式。 通过自动负载平衡频繁路由搜索(称为“热点”)的频繁请求数据“热点”的动态适应也是由于激励更复杂的节点的索引策略的激励。

    Avoiding three-valued logic in predicates on dictionary-encoded data
    69.
    发明授权
    Avoiding three-valued logic in predicates on dictionary-encoded data 失效
    在字典编码数据的谓词中避免三值逻辑

    公开(公告)号:US08244765B2

    公开(公告)日:2012-08-14

    申请号:US12570420

    申请日:2009-09-30

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30312 H03M7/3088

    摘要: According to one embodiment of the present invention, a method for dictionary encoding data without using three-valued logic is provided. According to one embodiment of the invention, a method includes encoding data in a database table using a dictionary, wherein the data includes values representing NULLs. A query having a predicate is received and the predicate is evaluated on the encoded data, whereby the predicate is evaluated on both the encoded data and on the encoded NULLs.

    摘要翻译: 根据本发明的一个实施例,提供了一种用于字典编码数据而不使用三值逻辑的方法。 根据本发明的一个实施例,一种方法包括使用字典对数据库表中的数据进行编码,其中数据包括表示NULL的值。 接收到具有谓词的查询,并且对编码数据评估谓词,由此在编码数据和编码的NULL上对谓词进行评估。

    Adaptive greedy method for ordering intersecting of a group of lists into a left-deep AND-tree
    70.
    发明授权
    Adaptive greedy method for ordering intersecting of a group of lists into a left-deep AND-tree 有权
    用于将一组列表与左深AND树相交的自适应贪心方法

    公开(公告)号:US07925604B2

    公开(公告)日:2011-04-12

    申请号:US11923684

    申请日:2007-10-25

    IPC分类号: G06F17/10 G06F17/30

    CPC分类号: G06F17/30498 G06F17/30592

    摘要: The embodiments of the invention provide a method of ordering an intersecting of a group of lists into a left-deep AND-tree. The method begins by performing a first selecting process including selecting a top list, corresponding to a top leaf of the left-deep AND-tree, from the group of lists to leave remaining lists of the group of lists. The top list can be the smallest list of the group of lists. The method can also select a pair of lists from the group of lists, such that the pair of lists has the smallest intersection size relative to other pairs of lists of the group of lists. Next, the method estimates intersections of the remaining lists with the top list by estimating an amount of intersection between the remaining lists and the top list. This involves sampling a portion of the remaining lists. The method also includes identifying larger list pairs having smaller intersections sizes when compared to smaller list pairs having larger intersections sizes.

    摘要翻译: 本发明的实施例提供了排序一组列表与左深AND树相交的方法。 该方法通过执行第一选择过程开始,包括从列表组中选择对应于左深AND树的顶叶的顶部列表,以留下列表组的剩余列表。 顶部列表可以是列表组中最小的列表。 该方法还可以从列表组中选择一对列表,使得该对列表相对于列表组的其他列表对具有最小的相交大小。 接下来,该方法通过估计剩余列表和顶部列表之间的交集量来估计剩余列表与顶部列表的交集。 这涉及对剩余列表的一部分进行抽样。 当与具有较大交叉点尺寸的较小列表对相比较时,该方法还包括识别具有较小交点尺寸的较大列表对。