Database aggregation query result estimator
    11.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07293037B2

    公开(公告)日:2007-11-06

    申请号:US11246354

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Sampling for queries
    12.
    发明授权
    Sampling for queries 有权
    查询抽样

    公开(公告)号:US07287020B2

    公开(公告)日:2007-10-23

    申请号:US09759804

    申请日:2001-01-12

    Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

    Abstract translation: 本公开描述了利用与执行的数据库查询相关联的工作负载信息来估计当前数据库查询的结果。 分析工作负载信息以确定查询执行期间数据库中元组的使用情况,例如访问元组的频率以及访问元组的不同查询的数量。 一个元组被分配一个基于分析的工作量信息的权重值。 为当前查询估计结果而采样的特定元组基于每个元组的权重值。 还可以利用工作负载信息来生成异常值索引,该索引识别与执行的查询相关联的异常值元组,或者识别与其他查询更频繁执行的特定查询相关联的异常值元组。 当前查询的结果也可以使用采样值以及来自离群值索引的异常值元组来估计。

    Sampling for aggregation queries
    14.
    发明授权
    Sampling for aggregation queries 有权
    聚合查询的抽样

    公开(公告)号:US06842753B2

    公开(公告)日:2005-01-11

    申请号:US09759799

    申请日:2001-01-12

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后以许多已知方式之一对剩余的没有异常值的数据进行采样,以提供统计学相关的样本,然后进行聚合和外推,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。 进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

    Sampling for database systems
    15.
    发明授权

    公开(公告)号:US06532458B1

    公开(公告)日:2003-03-11

    申请号:US09268590

    申请日:1999-03-15

    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    Sampling for database systems
    16.
    发明授权
    Sampling for database systems 失效
    数据库系统的抽样

    公开(公告)号:US07567949B2

    公开(公告)日:2009-07-28

    申请号:US10238175

    申请日:2002-09-10

    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    Abstract translation: 数据库服务器根据期望的抽样语义(例如替换(WR),无替换(WoR)或独立硬币翻转(CF))语义支持对记录或元组进行加权和未加权采样。 数据库服务器可以顺序地执行这样的采样,以便例如非查询记录例如在查询树中由流水线生成的非实体记录,但是也可以在一次通过中对采样记录(无论是否实现)进行采样。 数据库服务器还支持对两个记录或元组关系的连接进行抽样,而不需要计算完整连接,而不需要在关系的连接属性值上实现关系和/或索引。

Patent Agency Ranking