Database aggregation query result estimator
    14.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07293037B2

    公开(公告)日:2007-11-06

    申请号:US11246354

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Sampling for queries
    15.
    发明授权
    Sampling for queries 有权
    查询抽样

    公开(公告)号:US07287020B2

    公开(公告)日:2007-10-23

    申请号:US09759804

    申请日:2001-01-12

    Abstract: This disclosure describes leveraging workload information associated with executed database queries for estimating the result of a current database query. The workload information is analyzed to determine the usage of tuples in a database during query execution, such as how often a tuple is accessed and the number of different queries that accessed the tuple. A tuple is assigned a weight value that is based on the analyzed workload information. The particular tuples sampled for estimating a result for the current query is based on each tuple's weight value. The workload information may also be leveraged to generate an outlier index that identifies outlier tuples associated with the executed queries or that identifies outlier tuples associated with particular queries that are executed more frequently than other queries. The result for the current query can also be estimated using the sampled values along with the outlier tuples from the outlier index.

    Abstract translation: 本公开描述了利用与执行的数据库查询相关联的工作负载信息来估计当前数据库查询的结果。 分析工作负载信息以确定查询执行期间数据库中元组的使用情况,例如访问元组的频率以及访问元组的不同查询的数量。 一个元组被分配一个基于分析的工作量信息的权重值。 为当前查询估计结果而采样的特定元组基于每个元组的权重值。 还可以利用工作负载信息来生成异常值索引,该索引识别与执行的查询相关联的异常值元组,或者识别与其他查询更频繁执行的特定查询相关联的异常值元组。 当前查询的结果也可以使用采样值以及来自离群值索引的异常值元组来估计。

    Sampling for aggregation queries
    17.
    发明授权
    Sampling for aggregation queries 有权
    聚合查询的抽样

    公开(公告)号:US06842753B2

    公开(公告)日:2005-01-11

    申请号:US09759799

    申请日:2001-01-12

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后以许多已知方式之一对剩余的没有异常值的数据进行采样,以提供统计学相关的样本,然后进行聚合和外推,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。 进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer
    18.
    发明授权
    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
    基于加权分割和征服的计算机实现可扩展,增量和并行聚类

    公开(公告)号:US06907380B2

    公开(公告)日:2005-06-14

    申请号:US10726254

    申请日:2003-12-01

    CPC classification number: G06K9/6218 Y10S707/99936 Y10S707/99937

    Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

    Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。 该技术包括:1)将集合S划分成P个不相交的部分S 1。 。 。 ,S 2)对于每个块S i确定k个中间中心的集合D i i i i, 3)将每个片段S i中的每个数据点分配给k个中间中心中最接近的一个; 4)通过分配给该中心的相应片段S i i中的点的数量对每个集合D i i i中的每个k个中间中心进行加权; 和5)将加权中间体聚类在一起以找到所述k个最终中心,使用特定的误差度量和聚类方法A进行聚类。

    Database aggregation query result estimator
    19.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07191181B2

    公开(公告)日:2007-03-13

    申请号:US10873569

    申请日:2004-06-22

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后以许多已知方式之一对剩余的没有异常值的数据进行采样,以提供统计学相关的样本,然后进行聚合和外推,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。 进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer
    20.
    发明授权
    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
    基于加权分割和征服的计算机实现可扩展,增量和并行聚类

    公开(公告)号:US06684177B2

    公开(公告)日:2004-01-27

    申请号:US09854212

    申请日:2001-05-10

    CPC classification number: G06K9/6218 Y10S707/99936 Y10S707/99937

    Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

    Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。 该技术包括:1)将集合S划分成P个不相交的部分S1。 。 。 ,SP; 2)对于每个块Si,确定k个中心的集合Di; 3)将每个片段Si中的每个数据点分配给k个中间的最近的一个; 4)通过分配给该中心的相应片段Si中的点的数量对每个集合Di中的每个k个中间中心进行加权; 和5)将加权中间体聚类在一起以找到所述k个最终中心,使用特定的误差度量和聚类方法A进行聚类。

Patent Agency Ranking