Sampling for queries
    1.
    发明申请
    Sampling for queries 有权
    查询抽样

    公开(公告)号:US20060085410A1

    公开(公告)日:2006-04-20

    申请号:US11296036

    申请日:2005-12-07

    Abstract: A method of estimating the Results of a database query are estimated by performing a sampling of weighted tuples in a database based on a probability of usage of tuples required in executing a workload. A probability is associated with each tuple sampled. And, can aggregate is computed over values in each sampled tuple while multiplying by the inverses of the probabilities associated with each tuple sampled.

    Abstract translation: 通过基于执行工作负载所需的元组的使用概率,对数据库中的加权元组进行抽样来估计估计数据库查询结果的方法。 每个元组采样的概率相关。 并且,可以在每个采样的元组中的值上计算可以聚合,同时乘以与每个元组采样相关联的概率的逆。

    Database aggregation query result estimator
    2.
    发明申请
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US20060053103A1

    公开(公告)日:2006-03-09

    申请号:US11246354

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Sampling for database systems
    3.
    发明授权
    Sampling for database systems 失效
    数据库系统的抽样

    公开(公告)号:US07567949B2

    公开(公告)日:2009-07-28

    申请号:US10238175

    申请日:2002-09-10

    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    Abstract translation: 数据库服务器根据期望的抽样语义(例如替换(WR),无替换(WoR)或独立硬币翻转(CF))语义支持对记录或元组进行加权和未加权采样。 数据库服务器可以顺序地执行这样的采样,以便例如非查询记录例如在查询树中由流水线生成的非实体记录,但是也可以在一次通过中对采样记录(无论是否实现)进行采样。 数据库服务器还支持对两个记录或元组关系的连接进行抽样,而不需要计算完整连接,而不需要在关系的连接属性值上实现关系和/或索引。

    Sampling over joins for database systems
    6.
    发明授权
    Sampling over joins for database systems 有权
    对数据库系统的连接进行抽样

    公开(公告)号:US06542886B1

    公开(公告)日:2003-04-01

    申请号:US09268275

    申请日:1999-03-15

    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    Abstract translation: 数据库服务器根据期望的抽样语义(例如替换(WR),无替换(WoR)或独立硬币翻转(CF))语义支持对记录或元组进行加权和未加权采样。 数据库服务器可以顺序地执行这样的采样,以便例如在查询树中通过流水线生成的诸如作为流生成的非物化记录,而且在单次通过中对采样记录(无论是否具体化)进行采样。 数据库服务器还支持对两个记录或元组关系的连接进行抽样,而不需要计算完整连接,而不需要在关系的连接属性值上实现关系和/或索引。

    Histogram construction using adaptive random sampling with cross-validation for database systems
    7.
    发明授权
    Histogram construction using adaptive random sampling with cross-validation for database systems 有权
    使用自适应随机抽样与数据库系统交叉验证的直方图构造

    公开(公告)号:US06278989B1

    公开(公告)日:2001-08-21

    申请号:US09139835

    申请日:1998-08-25

    Abstract: Using adaptive random sampling with cross-validation helps determine when enough data of a database has been sampled to construct histograms on one or more columns of one or more tables of the database within a desired or predetermined degree of accuracy. An adaptive random sampling histogram construction tool constructs an approximate equi-height k-histogram using an initial sample of data values from the database and iteratively updates the histogram using an additional sample of data values from the database until the histogram is within the desired degree of accuracy. The accuracy of the histogram is cross-validated against the additional sample at each iteration, and the additional sample is used to update the histogram to help improve its accuracy. The accuracy of the histogram may be measured by an error in distribution of the additional sample over the histogram as compared to a threshold error using a suitable error metric. By attempting to sample only the number of data values necessary to construct the histogram within the desired degree of accuracy, the adaptive random sampling histogram construction tool attempts to avoid any cost increases in time and memory from sampling too many data values.

    Abstract translation: 使用具有交叉验证的自适应随机抽样有助于确定在数据库的足够数据被采样以在期望的或预定的准确度内在数据库的一个或多个表的一个或多个列上构造直方图。 自适应随机抽样直方图构造工具使用来自数据库的数据值的初始样本构建近似等高k直方图,并使用来自数据库的附加数据值样本迭代地更新直方图,直到直方图在所需的程度 准确性。 在每次迭代时,直方图的精度与附加样本进行交叉验证,并且附加样本用于更新直方图以帮助提高其准确性。 与使用合适的误差度量的阈值误差相比,可以通过直方图上的附加样本的分布误差来测量直方图的精度。 通过尝试仅在所需精度范围内仅采样构建直方图所需的数据值的数量,自适应随机抽样直方图构造工具尝试避免在采样太多数据值时的时间和内存中的任何成本增加。

    Sampling for database systems
    8.
    发明授权

    公开(公告)号:US06532458B1

    公开(公告)日:2003-03-11

    申请号:US09268590

    申请日:1999-03-15

    Abstract: A database server supports weighted and unweighted sampling of records or tuples in accordance with desired sampling semantics such as with replacement (WR), without replacement (WoR), or independent coin flips (CF) semantics, for example. The database server may perform such sampling sequentially not only to sample non-materialized records, such as those produced as a stream by a pipeline in a query tree for example, but also to sample records, whether materialized or not, in a single pass. The database server also supports sampling over a join of two relations of records or tuples without requiring the computation of the full join and without requiring the materialization of both relations and/or indexes on the join attribute values of both relations.

    Database aggregation query result estimator
    9.
    发明授权
    Database aggregation query result estimator 有权
    数据库聚合查询结果估计器

    公开(公告)号:US07363301B2

    公开(公告)日:2008-04-22

    申请号:US11246355

    申请日:2005-10-07

    Abstract: Aggregation queries are performed by first identifying outlier values, aggregating the outlier values, and sampling the remaining data after pruning the outlier values. The sampled data is extrapolated and added to the aggregated outlier values to provide an estimate for each aggregation query. Outlier values are identified by selecting values outside of a selected sliding window of data having the lowest variance. An index is created for the outlier values. The outlier data is removed from the window of data, and separately aggregated. The remaining data without the outliers is then sampled to provide a statistically relevant sample that is then aggregated and extrapolated to provide an estimate for the remaining data. This sampled estimate is combined with the outlier aggregate to form an estimate for the entire set of data.

    Abstract translation: 通过首先识别异常值,聚合异常值和在修剪异常值之后对剩余数据进行采样来执行聚合查询。 采样数据被外推并加到聚合异常值中,以提供每个聚合查询的估计。 异常值通过选择具有最小方差的数据的所选滑动窗口之外的值来识别。 为异常值创建索引。 离群数据从数据窗口中移除,并单独汇总。 然后对没有异常值的剩余数据进行采样,以提供统计学上相关的样本,然后对其进行聚合和外插,以提供剩余数据的估计。 该采样估计与异常值聚合组合以形成整套数据的估计。

    Efficient fuzzy match for evaluating data records
    10.
    发明授权
    Efficient fuzzy match for evaluating data records 有权
    用于评估数据记录的高效模糊匹配

    公开(公告)号:US07296011B2

    公开(公告)日:2007-11-13

    申请号:US10600083

    申请日:2003-06-20

    CPC classification number: G06F17/30542 G06F17/30303 Y10S707/99933

    Abstract: To help ensure high data quality, data warehouses validate and clean, if needed incoming data tuples from external sources. In many situations, input tuples or portions of input tuples must match acceptable tuples in a reference table. For example, product name and description fields in a sales record from a distributor must match the pre-recorded name and description fields in a product reference relation. A disclosed system implements an efficient and accurate approximate or fuzzy match operation that can effectively clean an incoming tuple if it fails to match exactly with any of the multiple tuples in the reference relation. A disclosed similarity function that utilizes token substrings referred to as q-grams overcomes limitations of prior art similarity functions while efficiently performing a fuzzy match process.

    Abstract translation: 为了帮助确保高数据质量,数据仓库验证和清理,如果需要外部来源的传入数据元组。 在许多情况下,输入元组或输入元组的一部分必须匹配参考表中可接受的元组。 例如,分销商的销售记录中的产品名称和描述字段必须与产品参考关系中的预先记录的名称和描述字段相匹配。 所公开的系统实现有效和准确的近似或模糊匹配操作,其可以有效地清除传入元组,如果它不能与参考关系中的任何多个元组完全匹配。 使用称为q-gram的令牌子串的公开的相似度函数克服了现有技术相似度功能的限制,同时有效地执行模糊匹配过程。

Patent Agency Ranking