Database aggregation query result estimator
    1.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07363301B2

    公开(公告)日:2008-04-22

    申请号:US11246355

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Sampling for queries
    2.
    发明申请
    Sampling for queries 有权
    查询抽样

    公开(公告)号:US20060085410A1

    公开(公告)日:2006-04-20

    申请号:US11296036

    申请日:2005-12-07

    Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

    Abstract translation: 通过基于执行工作负载所需的元组的使用概率,对数据库中的加权元组进行抽样来估计估计数据库查询结果的方法。 每个元组采样的概率相关。 并且,可以在每个采样的元组中的值上计算可以聚合,同时乘以与每个元组采样相关联的概率的逆。

    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer
    3.
    发明授权
    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
    基于加权分割和征服的计算机实现可扩展,增量和并行聚类

    公开(公告)号:US06907380B2

    公开(公告)日:2005-06-14

    申请号:US10726254

    申请日:2003-12-01

    CPC classification number: G06K9/6218 Y10S707/99936 Y10S707/99937

    Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , Sp; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

    Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。 该技术包括:1)将集合S划分成P个不相交的部分S 1。 。 。 ,S 2)对于每个块S i确定k个中间中心的集合D i i i i, 3)将每个片段S i中的每个数据点分配给k个中间中心中最接近的一个; 4)通过分配给该中心的相应片段S i i中的点的数量对每个集合D i i i中的每个k个中间中心进行加权; 和5)将加权中间体聚类在一起以找到所述k个最终中心,使用特定的误差度量和聚类方法A进行聚类。

    Database aggregation query result estimator
    4.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07191181B2

    公开(公告)日:2007-03-13

    申请号:US10873569

    申请日:2004-06-22

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled in one of many known ways to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data. Further methods involve the use of weighted sampling and weighted selection of outlier values for low selectivity queries, or queries having group by.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后以许多已知方式之一对剩余的没有异常值的数据进行采样,以提供统计学相关的样本,然后进行聚合和外推,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。 进一步的方法涉及对低选择性查询或具有分组查询的异常值的加权采样和加权选择。

    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer
    5.
    发明授权
    Computer implemented scalable, incremental and parallel clustering based on weighted divide and conquer 有权
    基于加权分割和征服的计算机实现可扩展,增量和并行聚类

    公开(公告)号:US06684177B2

    公开(公告)日:2004-01-27

    申请号:US09854212

    申请日:2001-05-10

    CPC classification number: G06K9/6218 Y10S707/99936 Y10S707/99937

    Abstract: A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S1, . . . , SP; 2) for each piece Si, determining a set Di of k intermediate centers; 3) assigning each data point in each piece Si to the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Di by the number of points in the corresponding piece Si assigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.

    Abstract translation: 一种使用加权分割和征服方法来聚集n个数据点的集合S以找到k个最终中心的技术。 该技术包括:1)将集合S划分成P个不相交的部分S1。 。 。 ,SP; 2)对于每个块Si,确定k个中心的集合Di; 3)将每个片段Si中的每个数据点分配给k个中间的最近的一个; 4)通过分配给该中心的相应片段Si中的点的数量对每个集合Di中的每个k个中间中心进行加权; 和5)将加权中间体聚类在一起以找到所述k个最终中心,使用特定的误差度量和聚类方法A进行聚类。

    Continuous processing language for real-time data streams
    6.
    发明授权
    Continuous processing language for real-time data streams 有权
    用于实时数据流的连续处理语言

    公开(公告)号:US08396886B1

    公开(公告)日:2013-03-12

    申请号:US11346119

    申请日:2006-02-02

    CPC classification number: G06F17/30533 G06F17/30516

    Abstract: A computer software language capable of expressing registered queries that operate on one more or more data streams continuously. The language of the present invention is based on a publish/subscribe model in that queries subscribe to data streams and publish to data streams. Also, the language of the present invention can express queries that operate directly on data streams. Since queries expressed in the language of the present invention may be executed continuously and directly on data streams, the language includes a clause for specifying time-based and/or row-based windows for the input data stream. Operations are then performed on the data within such windows. In one embodiment, the language is also SQL-like and includes a clause for defining named windows (which can be used in any number of queries); a clause for detecting a pattern, and correlated database subqueries for correlating data stream data with database tables.

    Abstract translation: 一种能够连续地表达对一个或多个数据流进行操作的注册查询的计算机软件语言。 本发明的语言基于发布/订阅模型,其中查询订阅数据流并发布到数据流。 此外,本发明的语言可以表达直接对数据流进行操作的查询。 由于以本发明的语言表示的查询可以连续且直接地在数据流上执行,所以该语言包括用于为输入数据流指定基于时间的和/或基于行的窗口的子句。 然后对这些窗口内的数据执行操作。 在一个实施例中,语言也是类似SQL的,并且包括用于定义命名窗口(可以在任意数量的查询中使用)的子句; 用于检测模式的子句,以及用于将数据流数据与数据库表相关联的数据库子查询。

    Efficient fuzzy match for evaluating data records
    7.
    发明授权
    Efficient fuzzy match for evaluating data records 有权
    用于评估数据记录的高效模糊匹配

    公开(公告)号:US07296011B2

    公开(公告)日:2007-11-13

    申请号:US10600083

    申请日:2003-06-20

    CPC classification number: G06F17/30542 G06F17/30303 Y10S707/99933

    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

    Abstract translation: 为了帮助确保高数据质量,数据仓库验证和清理,如果需要外部来源的传入数据元组。 在许多情况下,输入元组或输入元组的一部分必须匹配参考表中可接受的元组。 例如,分销商的销售记录中的产品名称和描述字段必须与产品参考关系中的预先记录的名称和描述字段相匹配。 所公开的系统实现有效和准确的近似或模糊匹配操作,其可以有效地清除传入元组,如果它不能与参考关系中的任何多个元组完全匹配。 使用称为q-gram的令牌子串的公开的相似度函数克服了现有技术相似度功能的限制,同时有效地执行模糊匹配过程。

    Database aggregation query result estimator
    8.
    发明申请
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US20060053103A1

    公开(公告)日:2006-03-09

    申请号:US11246354

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Publish and subscribe capable continuous query processor for real-time data streams
    10.
    发明授权
    Publish and subscribe capable continuous query processor for real-time data streams 有权
    发布和订阅能力强的连续查询处理器实时数据流

    公开(公告)号:US07383253B1

    公开(公告)日:2008-06-03

    申请号:US11015963

    申请日:2004-12-17

    CPC classification number: G06F17/30516 Y10S707/918 Y10S707/99933

    Abstract: A Continuous Query Processor processes queries on continuously updating data sources or data streams and includes a Publication Manager for accepting published structured elements of data from data stream Publishers, a Subscription Manager for giving structured elements of data to one or more data stream Subscribers, a Query Module Manager for processing queries represented by Query Modules, a Query Module Store for maintaining deployed Query Modules, a Query Primitive Manager performing processing for various primitives that comprise a Query Module, and a Schedule Manager for coordinating when a primitive within a Query Module gets processed in order to maintain that each continuous query is continuously updated immediately upon the arrival of structured element data affecting any part of a continuous query.

    Abstract translation: 连续查询处理器处理对持续更新数据源或数据流的查询,并包括一个出版管理器,用于接受数据流发布者发布的数据结构元素,订阅管理器,用于向一个或多个数据流订阅者提供数据结构化元素,查询 用于处理由查询模块表示的查询的模块管理器,用于维护部署的查询模块的查询模块存储库,执行针对构成查询模块的各种图元的处理的查询基元管理器,以及用于在查询模块中的原语处理时进行协调的计划管理器 以便保持在影响连续查询的任何部分的结构化元素数据到达时立即连续更新每个连续查询。

Patent Agency Ranking