Query Execution Plans by Compilation-Time Execution
    2.
    发明申请
    Query Execution Plans by Compilation-Time Execution 审中-公开
    通过编译时执行查询执行计划

    公开(公告)号:US20090327214A1

    公开(公告)日:2009-12-31

    申请号:US12146423

    申请日:2008-06-25

    IPC分类号: G06F17/30

    CPC分类号: G06F16/24545

    摘要: Described is a query optimizer comprising a query tuner that performs actual execution of query fragments to obtain actual results during compilation time, and uses those actual results to select a query plan. The actual results may be combined with estimates for fragments that were not executed. The tree may be traversed in a top-down traversal, processing every node. Alternatively, the tree may be traversed in a bottom-up traversal, re-deriving data for higher nodes as each lower level is completed. A limit, such as a time limit or level limit, may be used to control how much time is taken to determine the execution plan.

    摘要翻译: 描述了一种查询优化器,其包括执行查询片段的实际执行以在编译期间获得实际结果的查询调谐器,并且使用这些实际结果来选择查询计划。 实际结果可能与未执行的片段的估计相结合。 可以在上下遍历中遍历树,处理每个节点。 或者,可以在自下而上遍历中遍历树,在每个较低级完成时,重新导出较高节点的数据。 可以使用诸如时间限制或级别限制的限制来控制用于确定执行计划的时间。

    Generating histograms of population data by scaling from sample data
    3.
    发明授权
    Generating histograms of population data by scaling from sample data 有权
    通过从样本数据缩放生成填充数据的直方图

    公开(公告)号:US08316009B2

    公开(公告)日:2012-11-20

    申请号:US12700274

    申请日:2010-02-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/18 G06F17/30469

    摘要: Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a “Chao” estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a “sum of the parts” mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.

    摘要翻译: 基于群体样本形成的直方图,例如从数据存储的随机页面级样本创建的直方图,被智能地缩放到估计数据存储的整个群体的分布的直方图。 作为可选优化,在页级采样期间观察到重复样本的阈值数量时,总体总体数据中不同值的数量被推定为样本数据中不同值的数量。 此外,在估计总体人口的不同值时,可以可选地将Chao估计量用作估计的下限。 然后在缩放时使用所得到的估计,这可以将数据的领域知识考虑到考虑中,以防止定标估计超出域的限制。 此外,在缩放期间可以考虑到部分数学关系的总和,即估计直方图的每个bin的缩放的不同值的总和应该对于整个群体的总不同值的总和。

    GENERATING HISTOGRAMS OF POPULATION DATA BY SCALING FROM SAMPLE DATA
    5.
    发明申请
    GENERATING HISTOGRAMS OF POPULATION DATA BY SCALING FROM SAMPLE DATA 有权
    通过从样本数据中分类生成人口统计数据

    公开(公告)号:US20100138407A1

    公开(公告)日:2010-06-03

    申请号:US12700274

    申请日:2010-02-04

    IPC分类号: G06F15/18 G06F17/30

    CPC分类号: G06F17/18 G06F17/30469

    摘要: Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a “Chao” estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a “sum of the parts” mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.

    摘要翻译: 基于群体样本形成的直方图,例如从数据存储的随机页面级样本创建的直方图,被智能地缩放到估计数据存储的整个群体的分布的直方图。 作为可选优化,在页级采样期间观察到重复样本的阈值数量时,总体总体数据中不同值的数量被推定为样本数据中不同值的数量。 此外,在估计总体人口的不同价值时,可以可选地将“超”估计量用作估计的下限。 然后在缩放时使用所得到的估计,这可以将数据的领域知识考虑到考虑中,以防止定标估计超出域的限制。 此外,在缩放期间可以考虑“部分之和”数学关系,即估计直方图的每个仓的缩放的不同值的总和应该对整个群体的总不同值的总和。

    Generating histograms of population data by scaling from sample data
    6.
    发明授权
    Generating histograms of population data by scaling from sample data 有权
    通过从样本数据缩放生成填充数据的直方图

    公开(公告)号:US07707005B2

    公开(公告)日:2010-04-27

    申请号:US11469855

    申请日:2006-09-02

    IPC分类号: G06F19/00 G06F17/40 G06F17/18

    CPC分类号: G06F17/18 G06F17/30469

    摘要: Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a “Chao” estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain. Also, a “sum of the parts” mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.

    摘要翻译: 基于群体样本形成的直方图,例如从数据存储的随机页面级样本创建的直方图,被智能地缩放到估计数据存储的整个群体的分布的直方图。 作为可选优化,在页级采样期间观察到重复样本的阈值数量时,总体总体数据中不同值的数量被推定为样本数据中不同值的数量。 此外,在估计总体人口的不同价值时,可以可选地将“超”估计量用作估计的下限。 然后在缩放时使用所得到的估计,这可以将数据的领域知识考虑到考虑中,以防止定标估计超出域的限制。 此外,在缩放期间可以考虑“部分之和”数学关系,即估计直方图的每个仓的缩放的不同值的总和应该对整个群体的总不同值的总和。

    System and method for using a compressed trie to estimate like predicates
    7.
    发明授权
    System and method for using a compressed trie to estimate like predicates 有权
    使用压缩特里来估计像谓词的系统和方法

    公开(公告)号:US07308459B2

    公开(公告)日:2007-12-11

    申请号:US10978901

    申请日:2004-11-01

    IPC分类号: G06F17/30

    摘要: A compressed trie has nodes including multiple character sub-strings. Such multiple character storage reduces the number of nodes in the trie, thereby reducing the amount of memory required for storing the trie and reducing the amount of time required to perform matching. Furthermore, in such a compressed trie, sub-strings are stored in a single character string. Each node references its corresponding sub-string by the sub-string's starting position and length in the character string. Multiple nodes may reference a single sub-string. Thus, referencing rather than storing sub-strings in corresponding nodes eliminates repetitive sub-string storage, thereby reducing the amount of memory required for storing the trie.

    摘要翻译: 压缩特技包含多个字符子串的节点。 这种多字符存储减少了特里部分中的节点数量,从而减少了存储所需的内存所需的内存量,并减少了执行匹配所需的时间量。 此外,在这样的压缩特技中,子串被存储在单个字符串中。 每个节点通过子字符串的起始位置和字符串中的长度引用其对应的子字符串。 多个节点可以引用单个子串。 因此,引用而不是将子串存储在相应的节点中消除了重复的子串存储,从而减少了存储该线索所需的内存量。

    System and method for using a compressed trie to estimate like predicates
    8.
    发明授权
    System and method for using a compressed trie to estimate like predicates 失效
    使用压缩特里来估计像谓词的系统和方法

    公开(公告)号:US07519611B2

    公开(公告)日:2009-04-14

    申请号:US10926624

    申请日:2004-08-26

    IPC分类号: G06F17/30

    摘要: A compressed trie has nodes including multiple character sub-strings. Such multiple character storage reduces the number of nodes in the trie, thereby reducing the amount of memory required for storing the trie and reducing the amount of time required to perform matching. Furthermore, in such a compressed trie, sub-strings are stored in a single character string. Each node references its corresponding sub-string by the sub-string's starting position and length in the character string. Multiple nodes may reference a single sub-string. Thus, referencing rather than storing sub-strings in corresponding nodes eliminates repetitive sub-string storage, thereby reducing the amount of memory required for storing the trie.

    摘要翻译: 压缩特技包含多个字符子串的节点。 这种多字符存储减少了特里部分中的节点数量,从而减少了存储所需的内存所需的内存量,并减少了执行匹配所需的时间量。 此外,在这样的压缩特技中,子串被存储在单个字符串中。 每个节点通过子字符串的起始位置和字符串中的长度引用其对应的子字符串。 多个节点可以引用单个子串。 因此,引用而不是将子串存储在相应的节点中消除了重复的子串存储,从而减少了存储该线索所需的内存量。

    GENERATING HISTOGRAMS OF POPULATION DATA BY SCALING FROM SAMPLE DATA
    9.
    发明申请
    GENERATING HISTOGRAMS OF POPULATION DATA BY SCALING FROM SAMPLE DATA 有权
    通过从样本数据中分类生成人口统计数据

    公开(公告)号:US20080059125A1

    公开(公告)日:2008-03-06

    申请号:US11469855

    申请日:2006-09-02

    IPC分类号: G06F19/00 G06F17/40 G06F17/18

    CPC分类号: G06F17/18 G06F17/30469

    摘要: Histograms formed based on samples of a population, such as histograms created from random page-level samples of a data store, are intelligently scaled to histograms estimating distribution of the entire population of the data store. As an optional optimization, where a threshold number of duplicate samples are observed during page-level sampling, the number of distinct values in the overall population data is presumed to be the number of distinct values in the sample data. Also, during estimation of distinct values of an overall population, a “Chao” estimator can optionally be utilized as a lower bound of the estimate. The resulting estimate is then used when scaling, which can take domain knowledge of the data being scaled into account in order to prevent scaled estimates from exceeding the limits of the domain Also, a “sum of the parts” mathematical relationship can be taken into account during scaling that the sum of the scaled distinct values for each bin of an estimate histogram should total an estimate for the total distinct values of the entire population.

    摘要翻译: 基于群体样本形成的直方图,例如从数据存储的随机页面级样本创建的直方图,被智能地缩放到估计数据存储的整个群体的分布的直方图。 作为可选优化,在页级采样期间观察到重复样本的阈值数量时,总体总体数据中不同值的数量被推定为样本数据中不同值的数量。 此外,在估计总体人口的不同价值时,可以可选地将“超”估计量用作估计的下限。 然后在缩放时使用所得到的估计,这可以将数据的领域知识考虑到考虑中,以防止按比例估计超出域的限制。另外,可以考虑“部件之和”数学关系 在缩放期间,估计直方图的每个bin的缩放的不同值的总和应该对整个群体的总不同值的总和。